Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEA add secure persistence #128

Merged
merged 97 commits into from
Sep 16, 2022
Merged

Conversation

adrinjalali
Copy link
Member

@adrinjalali adrinjalali commented Sep 6, 2022

This PR adds secure persistence for sklearn models. You can test it with:

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegressionCV

from skops import load, save

X, y = load_iris(return_X_y=True)
model = LogisticRegressionCV(solver="liblinear").fit(X, y)
save(file="/tmp/test.skops", obj=model)
instance = load(file="/tmp/test.skops")
print(model.score(X, y), instance.score(X, y))

The file it creates is a zip file which you can investigate. It creates a schema.json inside that zip file which includes all the info needed to reconstruct the object.

Things to add:

  • add versions (file format version and library version)
  • add docs and file specs
  • add common tests
  • add support for more types e.g.:
    • numpy rng types
    • scorers
    • custom scorers
    • cv splitters
    • ...

Things to do in a separate PR:

  • add code to allow extension from other third party libraries

We basically go through the attributes of the object, and we persist them, very similar to what https://github.com/pytorch/torchsnapshot does; except that the objects we deal with, unlike pytorch objects, don't expose state_dict and load_state_dict. Therefore we implement the equivalent of those methods here ourselves. For third party libraries, they would need to implement the equivalent methods and we'll have a way for them to register those methods for their objects with us.

This is a very early prototype, and very open to discussions regarding the format and the design.

cc @skops-dev/maintainers @osanseviero @LysandreJik @julien-c

@adrinjalali adrinjalali marked this pull request as draft September 6, 2022 07:29
@BenjaminBossan
Copy link
Collaborator

What do you think about storing having a metainfo object in the schema? So basically something like:

{
  "metainfo": {...},
  "obj": <actual obj>
}

We could put stuff like protocol version, sklearn/skops version, etc. into metainfo. Also, we could add a hash/fingerprint of the object there to verify it.

@merveenoyan merveenoyan changed the title FEA add secure pesrsistence FEA add secure persistence Sep 6, 2022
@adrinjalali
Copy link
Member Author

I added the common tests here, and here's the summary:

=============================================================================== short test summary info ================================================================================
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[ClassifierChain(base_estimator=LogisticRegression(C=1))] - TypeError: _BaseChain.__init__() missing 1 required positi...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[CountVectorizer()] - TypeError: Object of type type is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[DictVectorizer()] - TypeError: Object of type type is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[FeatureAgglomeration()] - TypeError: Object of type function is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[FeatureHasher()] - TypeError: Object of type type is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[GenericUnivariateSelect()] - TypeError: Object of type function is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[HashingVectorizer()] - TypeError: Object of type type is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[IterativeImputer()] - assert nan == nan
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[KNNImputer()] - assert nan == nan
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[Lars()] - AssertionError: assert <class 'numpy.float64'> == <class 'float'>
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[LarsCV()] - AssertionError: assert <class 'numpy.float64'> == <class 'float'>
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[LassoLars()] - AssertionError: assert <class 'numpy.float64'> == <class 'float'>
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[LassoLarsCV()] - AssertionError: assert <class 'numpy.float64'> == <class 'float'>
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[LassoLarsIC()] - AssertionError: assert <class 'numpy.float64'> == <class 'float'>
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[MLPClassifier()] - AssertionError: assert <class 'tuple'> == <class 'list'>
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[MLPRegressor()] - AssertionError: assert <class 'tuple'> == <class 'list'>
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[MinMaxScaler()] - AssertionError: assert <class 'tuple'> == <class 'list'>
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[MissingIndicator()] - assert nan == nan
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[MultiOutputClassifier(estimator=LogisticRegression(C=1))] - TypeError: MultiOutputClassifier.__init__() missing 1 req...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[MultiOutputRegressor(estimator=Ridge())] - TypeError: MultiOutputRegressor.__init__() missing 1 required positional a...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[OneHotEncoder()] - TypeError: Object of type type is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[OneVsOneClassifier(estimator=LogisticRegression(C=1))] - TypeError: OneVsOneClassifier.__init__() missing 1 required ...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[OneVsRestClassifier(estimator=LogisticRegression(C=1))] - TypeError: OneVsRestClassifier.__init__() missing 1 require...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[OrdinalEncoder()] - TypeError: Object of type type is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[OutputCodeClassifier(estimator=LogisticRegression(C=1))] - TypeError: OutputCodeClassifier.__init__() missing 1 requi...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[RFE(estimator=LogisticRegression(C=1))] - TypeError: RFE.__init__() missing 1 required positional argument: 'estimator'
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[RFECV(estimator=LogisticRegression(C=1))] - TypeError: RFECV.__init__() missing 1 required positional argument: 'esti...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[RegressorChain(base_estimator=Ridge())] - TypeError: _BaseChain.__init__() missing 1 required positional argument: 'b...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[RidgeCV()] - AssertionError: assert <class 'tuple'> == <class 'list'>
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[RidgeClassifierCV()] - AssertionError: assert <class 'tuple'> == <class 'list'>
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[RobustScaler()] - AssertionError: assert <class 'tuple'> == <class 'list'>
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[SelectFdr()] - TypeError: Object of type function is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[SelectFpr()] - TypeError: Object of type function is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[SelectFromModel(estimator=SGDRegressor(random_state=0))] - TypeError: SelectFromModel.__init__() missing 1 required p...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[SelectFwe()] - TypeError: Object of type function is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[SelectKBest()] - TypeError: Object of type function is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[SelectPercentile()] - TypeError: Object of type function is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[SelfTrainingClassifier(base_estimator=LogisticRegression(C=1))] - TypeError: SelfTrainingClassifier.__init__() missin...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[SequentialFeatureSelector(estimator=LogisticRegression(C=1))] - TypeError: SequentialFeatureSelector.__init__() missi...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[SimpleImputer()] - assert nan == nan
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[StackingClassifier(estimators=[('est1',LogisticRegression(C=0.1)),('est2',LogisticRegression(C=1))])] - TypeError: Ob...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[StackingRegressor(estimators=[('est1',Ridge(alpha=0.1)),('est2',Ridge(alpha=1))])] - TypeError: Object of type Ridge ...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[TfidfVectorizer()] - TypeError: Object of type type is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[VotingClassifier(estimators=[('est1',LogisticRegression(C=0.1)),('est2',LogisticRegression(C=1))])] - TypeError: Obje...
FAILED skops/tests/test_persist.py::test_can_persist_non_fitted[VotingRegressor(estimators=[('est1',Ridge(alpha=0.1)),('est2',Ridge(alpha=1))])] - TypeError: Object of type Ridge is...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[ARDRegression()] - TypeError: ARDRegression.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[AdaBoostClassifier()] - TypeError: Object of type DecisionTreeClassifier is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[AdaBoostRegressor()] - TypeError: Object of type DecisionTreeRegressor is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[AdditiveChi2Sampler()] - TypeError: AdditiveChi2Sampler.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[AffinityPropagation()] - TypeError: AffinityPropagation.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[AgglomerativeClustering()] - TypeError: AgglomerativeClustering.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[BaggingClassifier()] - TypeError: Object of type DecisionTreeClassifier is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[BaggingRegressor()] - TypeError: Object of type DecisionTreeRegressor is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[BayesianGaussianMixture()] - TypeError: BaseMixture.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[BernoulliRBM()] - TypeError: BernoulliRBM.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[Binarizer()] - TypeError: Binarizer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[Birch()] - TypeError: Birch.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[BisectingKMeans()] - TypeError: Object of type RandomState is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[CCA()] - TypeError: _PLS.fit() got an unexpected keyword argument 'y'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[CalibratedClassifierCV(base_estimator=LogisticRegression(C=1))] - TypeError: Object of type _CalibratedClassifier is not ...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[CategoricalNB()] - TypeError: Object of type ndarray is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[ClassifierChain(base_estimator=LogisticRegression(C=1))] - TypeError: ClassifierChain.fit() got an unexpected keyword arg...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[CountVectorizer()] - TypeError: CountVectorizer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[DecisionTreeClassifier()] - TypeError: Object of type int64 is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[DecisionTreeRegressor()] - TypeError: Object of type Tree is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[DictVectorizer()] - TypeError: DictVectorizer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[DictionaryLearning()] - TypeError: DictionaryLearning.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[EllipticEnvelope()] - TypeError: EllipticEnvelope.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[EmpiricalCovariance()] - TypeError: EmpiricalCovariance.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[ExtraTreeClassifier()] - TypeError: Object of type int64 is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[ExtraTreeRegressor()] - TypeError: Object of type Tree is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[ExtraTreesClassifier()] - TypeError: Object of type ExtraTreeClassifier is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[ExtraTreesRegressor()] - TypeError: Object of type ExtraTreeRegressor is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[FactorAnalysis()] - TypeError: FactorAnalysis.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[FastICA()] - TypeError: FastICA.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[FeatureAgglomeration()] - TypeError: FeatureAgglomeration.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[FeatureHasher()] - TypeError: FeatureHasher.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[FunctionTransformer()] - TypeError: FunctionTransformer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[GammaRegressor()] - TypeError: Object of type HalfGammaLoss is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[GaussianMixture()] - TypeError: BaseMixture.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[GaussianProcessClassifier()] - TypeError: GaussianProcessClassifier.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[GaussianProcessRegressor()] - TypeError: GaussianProcessRegressor.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[GaussianRandomProjection()] - TypeError: BaseRandomProjection.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[GenericUnivariateSelect()] - TypeError: _BaseFilter.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[GradientBoostingClassifier()] - TypeError: Object of type BinomialDeviance is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[GradientBoostingRegressor()] - TypeError: Object of type LeastSquaresError is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[GraphicalLasso()] - TypeError: GraphicalLasso.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[GraphicalLassoCV()] - TypeError: GraphicalLassoCV.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[HashingVectorizer()] - TypeError: HashingVectorizer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[HistGradientBoostingClassifier()] - TypeError: Object of type uint64 is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[HistGradientBoostingRegressor()] - TypeError: Object of type uint64 is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[IncrementalPCA()] - TypeError: IncrementalPCA.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[IsolationForest()] - TypeError: Object of type ExtraTreeRegressor is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[Isomap()] - TypeError: Isomap.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[IsotonicRegression()] - ValueError: Isotonic regression input X should be a 1d array or 2d array with 1 feature
FAILED skops/tests/test_persist.py::test_can_persist_fitted[IterativeImputer()] - TypeError: IterativeImputer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[KBinsDiscretizer()] - TypeError: KBinsDiscretizer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[KNNImputer()] - TypeError: KNNImputer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[KNeighborsClassifier()] - TypeError: KNeighborsClassifier.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[KNeighborsRegressor()] - TypeError: KNeighborsRegressor.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[KNeighborsTransformer()] - TypeError: KNeighborsTransformer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[KernelCenterer()] - TypeError: KernelCenterer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[KernelDensity()] - TypeError: Object of type KDTree is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[KernelPCA()] - TypeError: KernelPCA.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[KernelRidge()] - DeprecationWarning: The 'sym_pos' keyword is deprecated and should be replaced by using 'assume_a = "pos...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LabelBinarizer()] - TypeError: LabelBinarizer.fit() got multiple values for argument 'y'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LabelEncoder()] - TypeError: LabelEncoder.fit() got multiple values for argument 'y'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LabelPropagation()] - TypeError: LabelPropagation.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LabelSpreading()] - TypeError: BaseLabelPropagation.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[Lars()] - TypeError: Lars.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LarsCV()] - TypeError: LarsCV.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LassoLars()] - TypeError: Lars.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LassoLarsCV()] - TypeError: LarsCV.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LassoLarsIC()] - TypeError: LassoLarsIC.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LatentDirichletAllocation()] - TypeError: LatentDirichletAllocation.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LedoitWolf()] - TypeError: LedoitWolf.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LinearDiscriminantAnalysis()] - TypeError: LinearDiscriminantAnalysis.fit() got an unexpected keyword argument 'sample_we...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LocalOutlierFactor()] - TypeError: LocalOutlierFactor.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[LocallyLinearEmbedding()] - TypeError: LocallyLinearEmbedding.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MDS()] - TypeError: MDS.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MLPClassifier()] - TypeError: BaseMultilayerPerceptron.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MLPRegressor()] - TypeError: BaseMultilayerPerceptron.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MaxAbsScaler()] - TypeError: MaxAbsScaler.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MeanShift()] - TypeError: MeanShift.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MinCovDet()] - TypeError: MinCovDet.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MinMaxScaler()] - TypeError: MinMaxScaler.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MiniBatchDictionaryLearning()] - TypeError: MiniBatchDictionaryLearning.fit() got an unexpected keyword argument 'sample_...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MiniBatchNMF()] - TypeError: MiniBatchNMF.fit_transform() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MiniBatchSparsePCA()] - TypeError: MiniBatchSparsePCA.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MissingIndicator()] - TypeError: MissingIndicator.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MultiLabelBinarizer()] - TypeError: MultiLabelBinarizer.fit() got multiple values for argument 'y'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MultiOutputClassifier(estimator=LogisticRegression(C=1))] - TypeError: MultiOutputClassifier.fit() missing 1 required pos...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MultiOutputRegressor(estimator=Ridge())] - DeprecationWarning: The 'sym_pos' keyword is deprecated and should be replaced...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MultiTaskElasticNet()] - TypeError: MultiTaskElasticNet.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MultiTaskElasticNetCV()] - TypeError: MultiTaskElasticNetCV.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MultiTaskLasso()] - TypeError: MultiTaskElasticNet.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[MultiTaskLassoCV()] - TypeError: MultiTaskLassoCV.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[NMF()] - TypeError: NMF.fit_transform() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[NearestCentroid()] - TypeError: NearestCentroid.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[NearestNeighbors()] - TypeError: NearestNeighbors.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[NeighborhoodComponentsAnalysis()] - TypeError: NeighborhoodComponentsAnalysis.fit() got an unexpected keyword argument 's...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[Normalizer()] - TypeError: Normalizer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[Nystroem()] - TypeError: Nystroem.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[OAS()] - TypeError: OAS.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[OPTICS()] - TypeError: OPTICS.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[OneHotEncoder()] - TypeError: OneHotEncoder.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[OneVsOneClassifier(estimator=LogisticRegression(C=1))] - TypeError: OneVsOneClassifier.fit() got an unexpected keyword ar...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[OneVsRestClassifier(estimator=LogisticRegression(C=1))] - TypeError: OneVsRestClassifier.fit() got an unexpected keyword ...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[OrdinalEncoder()] - TypeError: OrdinalEncoder.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[OrthogonalMatchingPursuit()] - TypeError: OrthogonalMatchingPursuit.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[OrthogonalMatchingPursuitCV()] - TypeError: OrthogonalMatchingPursuitCV.fit() got an unexpected keyword argument 'sample_...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[OutputCodeClassifier(estimator=LogisticRegression(C=1))] - TypeError: OutputCodeClassifier.fit() got an unexpected keywor...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[PCA()] - TypeError: PCA.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[PLSCanonical()] - TypeError: _PLS.fit() got an unexpected keyword argument 'y'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[PLSRegression()] - TypeError: PLSRegression.fit() got an unexpected keyword argument 'y'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[PLSSVD()] - TypeError: PLSSVD.fit() got an unexpected keyword argument 'y'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[PassiveAggressiveClassifier()] - TypeError: PassiveAggressiveClassifier.fit() got an unexpected keyword argument 'sample_...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[PassiveAggressiveRegressor()] - TypeError: PassiveAggressiveRegressor.fit() got an unexpected keyword argument 'sample_we...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[PatchExtractor()] - TypeError: PatchExtractor.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[Perceptron()] - TypeError: Object of type Hinge is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[PoissonRegressor()] - TypeError: Object of type HalfPoissonLoss is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[PolynomialCountSketch()] - TypeError: PolynomialCountSketch.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[PolynomialFeatures()] - TypeError: PolynomialFeatures.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[PowerTransformer()] - TypeError: PowerTransformer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[QuadraticDiscriminantAnalysis()] - TypeError: QuadraticDiscriminantAnalysis.fit() got an unexpected keyword argument 'sam...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[QuantileRegressor()] - DeprecationWarning: `method='interior-point'` is deprecated and will be removed in SciPy 1.11.0. P...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[QuantileTransformer()] - TypeError: QuantileTransformer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[RBFSampler()] - TypeError: RBFSampler.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[RFE(estimator=LogisticRegression(C=1))] - TypeError: Object of type int64 is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[RFECV(estimator=LogisticRegression(C=1))] - TypeError: RFECV.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[RadiusNeighborsClassifier()] - TypeError: RadiusNeighborsClassifier.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[RadiusNeighborsRegressor()] - TypeError: RadiusNeighborsRegressor.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[RadiusNeighborsTransformer()] - TypeError: RadiusNeighborsTransformer.fit() got an unexpected keyword argument 'sample_we...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[RandomForestClassifier()] - TypeError: Object of type DecisionTreeClassifier is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[RandomForestRegressor()] - TypeError: Object of type DecisionTreeRegressor is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[RandomTreesEmbedding()] - TypeError: Object of type ExtraTreeRegressor is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[RegressorChain(base_estimator=Ridge())] - TypeError: RegressorChain.fit() missing 1 required positional argument: 'Y'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[Ridge()] - DeprecationWarning: The 'sym_pos' keyword is deprecated and should be replaced by using 'assume_a = "pos"'. 's...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[RidgeClassifier()] - DeprecationWarning: The 'sym_pos' keyword is deprecated and should be replaced by using 'assume_a = ...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[RobustScaler()] - TypeError: RobustScaler.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SGDClassifier()] - TypeError: Object of type Hinge is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SGDOneClassSVM()] - TypeError: Object of type Hinge is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SelectFdr()] - TypeError: _BaseFilter.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SelectFpr()] - TypeError: _BaseFilter.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SelectFromModel(estimator=SGDRegressor(random_state=0))] - TypeError: SelectFromModel.__init__() missing 1 required posit...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SelectFwe()] - TypeError: _BaseFilter.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SelectKBest()] - TypeError: _BaseFilter.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SelectPercentile()] - TypeError: _BaseFilter.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SelfTrainingClassifier(base_estimator=LogisticRegression(C=1))] - TypeError: SelfTrainingClassifier.fit() got an unexpect...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SequentialFeatureSelector(estimator=LogisticRegression(C=1))] - TypeError: SequentialFeatureSelector.fit() got an unexpec...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[ShrunkCovariance()] - TypeError: ShrunkCovariance.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SimpleImputer()] - TypeError: SimpleImputer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SkewedChi2Sampler()] - TypeError: SkewedChi2Sampler.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SparsePCA()] - TypeError: SparsePCA.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SparseRandomProjection()] - TypeError: BaseRandomProjection.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SpectralBiclustering()] - TypeError: BaseSpectral.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SpectralClustering()] - TypeError: SpectralClustering.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SpectralCoclustering()] - TypeError: BaseSpectral.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SpectralEmbedding()] - TypeError: SpectralEmbedding.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[SplineTransformer()] - TypeError: Object of type BSpline is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[StackingClassifier(estimators=[('est1',LogisticRegression(C=0.1)),('est2',LogisticRegression(C=1))])] - TypeError: Object...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[StackingRegressor(estimators=[('est1',Ridge(alpha=0.1)),('est2',Ridge(alpha=1))])] - DeprecationWarning: The 'sym_pos' ke...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[StandardScaler()] - TypeError: Object of type int64 is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[TSNE()] - TypeError: TSNE.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[TfidfTransformer()] - TypeError: TfidfTransformer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[TfidfVectorizer()] - TypeError: TfidfVectorizer.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[TheilSenRegressor()] - TypeError: TheilSenRegressor.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[TruncatedSVD()] - TypeError: TruncatedSVD.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[TweedieRegressor()] - TypeError: Object of type HalfTweedieLossIdentity is not JSON serializable
FAILED skops/tests/test_persist.py::test_can_persist_fitted[VarianceThreshold()] - TypeError: VarianceThreshold.fit() got an unexpected keyword argument 'sample_weight'
FAILED skops/tests/test_persist.py::test_can_persist_fitted[VotingClassifier(estimators=[('est1',LogisticRegression(C=0.1)),('est2',LogisticRegression(C=1))])] - TypeError: Object o...
FAILED skops/tests/test_persist.py::test_can_persist_fitted[VotingRegressor(estimators=[('est1',Ridge(alpha=0.1)),('est2',Ridge(alpha=1))])] - DeprecationWarning: The 'sym_pos' keyw...
=============================================================== 212 failed, 182 passed, 1 xfailed, 18 warnings in 7.25s ================================================================

@adrinjalali
Copy link
Member Author

What do you think about storing having a metainfo object in the schema? So basically something like:

{
  "metainfo": {...},
  "obj": <actual obj>
}

We could put stuff like protocol version, sklearn/skops version, etc. into metainfo. Also, we could add a hash/fingerprint of the object there to verify it.

It kinda makes sense, but there isn't a clear distinction between the two. For estimators, it makes total sense, but for a numpy array, is the file name metainfo or the object itself? Or for a numpy function, is there anything in the object then?

@adrinjalali
Copy link
Member Author

Of course we fail on 3.7 🤦🏼

adrinjalali and others added 3 commits September 7, 2022 14:30
- Add more test cases: nested pipeline, FunctionTransformer
- Make list of estimators that fail more fine grained; instead of
  ignoring a class completely, ignore a specific instance of the class
  because some instances of the same class may fail or not fail
- Mark estimators that fail to xfail with strict=True, this way we can
  quickly discover if a change made an estimator pass
- Fix a bug in testing function that did not correctly compare values if
  they were nan (because nan!=nan)
- Check predict_log_proba method
@BenjaminBossan
Copy link
Collaborator

@adrinjalali Hey, I accidentally pushed directly on your remote instead of using my own branch, sorry for that. Please let me know if my recent changes make sense to you or if I should fix anything. Here is the description:

  • Add more test cases: nested pipeline, FunctionTransformer
  • Make list of estimators that fail more fine grained; instead of
    ignoring a class completely, ignore a specific instance of the class
    because some instances of the same class may fail or not fail
  • Mark estimators that fail to xfail with strict=True, this way we can
    quickly discover if a change made an estimator pass
  • Fix a bug in testing function that did not correctly compare values if
    they were nan (because nan!=nan)
  • Check predict_log_proba method

I also wanted to add inverse_transform as a method to check but that caused some problems, because some of the defined estimators only support inverse_transform when initialized with specific parameters, which they aren't.

]


ESTIMATORS_EXPECTED_TO_FAIL_NON_FITTED = {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to have one list, where we have the estimator, and when we remove the estimator all tests for that estimator should pass. So I'll revert to the previous list.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't quite understand. What you describe should work with the current approach, no?

One problem of the previous approach is that it does not differentiate between different instantiations of the same estimator. E.g. FunctionTransformer with numpy functions works but FunctionTransformer with scipy functions doesn't. If only the class is checked, we can't differentiate.

Or is the problem that we have two lists, one for non-fitted and one for fitted?

skops/tests/test_persist.py Outdated Show resolved Hide resolved
adrinjalali and others added 11 commits September 7, 2022 15:06
This is useful e.g. for clustering models that don't have a predict
method.

While adding these tests. A few estimators started to fail. Most of them
failed because of the discussed issue of numpy scalars being loaded as
0-dim arrays. The issue was fixed via explicit type casting.

However, 3 estimators are now failing and had to be added back to the
list of failing estimators. I added a comment for each one of them why
they fail and how to potentially address the issue.
Works locally but not on CI...
@adrinjalali adrinjalali marked this pull request as ready for review September 15, 2022 14:10
On some systems, during conversion, there is a loss of precision that
makes the tests fail. This change loosesn the tolerance, making the
tests pass.
Windows complains that files don't have permissions, even though each
test gets their own file. Maybe this helps.
From my understanding, pytest will (eventually) clean up the temporary
files it creates. Therefore, use the pytest tmp_path fixture instead of
the builtin tempfile module and don't explicitly clean up.

What's strange is that for other tests, this wasn't necessary, not sure
why it is here. It may have to do with how we save and load here or with
the use of zip files. Ideally, someone with access to Windows could test
it.
Not pretty but let's see if it works.
@BenjaminBossan BenjaminBossan merged commit 4fe963a into skops-dev:main Sep 16, 2022
@adrinjalali adrinjalali deleted the persist branch September 16, 2022 14:51
@LysandreJik
Copy link

Great job getting this merged! 🤗

@osanseviero
Copy link
Contributor

Very exciting 🚀 looking forward to try it out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants