Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEA add secure persistence #128

Merged
merged 97 commits into from
Sep 16, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
97 commits
Select commit Hold shift + click to select a range
7932410
FEA add secure pesrsistence
adrinjalali Sep 6, 2022
d33885e
Merge remote-tracking branch 'upstream/main' into persist
adrinjalali Sep 6, 2022
947406c
fix init
adrinjalali Sep 6, 2022
ddf9e80
add common tests
adrinjalali Sep 7, 2022
7f08f9a
Additions to persistence PR (#1)
BenjaminBossan Sep 7, 2022
8ce7a7d
Merge remote-tracking branch 'upstream/main' into persist
adrinjalali Sep 7, 2022
6b0c727
skip tests for failing estimators
adrinjalali Sep 7, 2022
9109952
fixing tests?
adrinjalali Sep 7, 2022
4acf33a
Changes to persistence tests
BenjaminBossan Sep 7, 2022
6640dec
Merge branch 'persist' of github.com:adrinjalali/skops into persist-t…
BenjaminBossan Sep 7, 2022
d06ec51
Fix accidental revert of np.allclose
BenjaminBossan Sep 7, 2022
ccd35fd
commit to pull Ben's changes
adrinjalali Sep 7, 2022
5464e04
no idea what I'm doing
adrinjalali Sep 7, 2022
ad1982a
remove sample weight
adrinjalali Sep 7, 2022
8b7a612
feature union and pipeline are now tested
adrinjalali Sep 7, 2022
0569970
skip instead of xfail
adrinjalali Sep 7, 2022
95cf7bd
remove pipeline and featureunion specific functions
adrinjalali Sep 7, 2022
50ff9cd
regenerate the ignore list, and ignore less
adrinjalali Sep 7, 2022
3a9f3d4
fix callibratedclassifercv
adrinjalali Sep 7, 2022
876c545
Extend tests to cover learned attributes
BenjaminBossan Sep 7, 2022
5d1b43d
Add SimpleImputer to ignore list
BenjaminBossan Sep 8, 2022
a976450
Indent 2 in schema json for readability
BenjaminBossan Sep 8, 2022
c7f55c7
Smarter way to test pipelines and feature unions
BenjaminBossan Sep 8, 2022
4f0d439
major refactor
adrinjalali Sep 8, 2022
26e51e0
major refactor
adrinjalali Sep 8, 2022
c42f116
use a larger sample date
adrinjalali Sep 8, 2022
1777996
fast_fail=False in CI
adrinjalali Sep 8, 2022
d5a9a84
ignore warnings
adrinjalali Sep 8, 2022
fc62a47
Refactor to use singledispatch (#2)
BenjaminBossan Sep 9, 2022
9305cd7
introduce reduce in how we serialize things
adrinjalali Sep 9, 2022
4b73e41
Dynamically register dispatch functions (#3)
BenjaminBossan Sep 12, 2022
afa12d1
refactor try-except
adrinjalali Sep 12, 2022
56eee74
Merge branch 'persist' of github.com:adrinjalali/skops into persist
adrinjalali Sep 12, 2022
9d34ac2
A couple of fixes to make more tests pass (#4)
BenjaminBossan Sep 12, 2022
aa2c81d
make reduce constructor explicit
adrinjalali Sep 12, 2022
ddac492
Merge branch 'persist' of github.com:adrinjalali/skops into persist
adrinjalali Sep 12, 2022
470bb21
__dir__ -> __dict__
adrinjalali Sep 12, 2022
7e6a4bd
add more sklearn types
adrinjalali Sep 12, 2022
0fb061a
Fix bug with sparse matrices being saved on root
BenjaminBossan Sep 12, 2022
4a27e78
make the basic one very generic and apply to object
adrinjalali Sep 13, 2022
0b0f997
Merge branch 'persist' of github.com:adrinjalali/skops into persist
adrinjalali Sep 13, 2022
9847721
don't pass y as kwarg, some use Y
adrinjalali Sep 13, 2022
0537216
pass valid data to fit
adrinjalali Sep 13, 2022
f616093
Minor fixes: wrong argument, use of get_module
BenjaminBossan Sep 13, 2022
b0cf1c5
Merge branch 'persist' of github.com:adrinjalali/skops into adrinjala…
BenjaminBossan Sep 13, 2022
a6b17b2
Fix another bug in saving/loading sparse matrices
BenjaminBossan Sep 13, 2022
4479b15
Fix regression for FunctionTransformer + np ufunc
BenjaminBossan Sep 13, 2022
542cc2f
Use more robust array comparison in tests
BenjaminBossan Sep 13, 2022
a8b2053
Support random state and CV (#5)
BenjaminBossan Sep 13, 2022
73b75cb
fix ufuncs, including scipy ufuncs
adrinjalali Sep 13, 2022
9d0061d
set n_components and n_best for estimators which have them
adrinjalali Sep 13, 2022
08615c9
Fix for persistince numpy arrays of object dtype (#7)
BenjaminBossan Sep 13, 2022
47ff5c3
sgd losses use reduce
adrinjalali Sep 13, 2022
0276560
fix for pairwise input
adrinjalali Sep 13, 2022
32a373c
fix _DictWithDeprecatedKeys issue
adrinjalali Sep 13, 2022
0b8ee3c
save/load dtype
adrinjalali Sep 13, 2022
0448d64
fix namedtuples
adrinjalali Sep 14, 2022
0003c5a
Small get_module refactor
BenjaminBossan Sep 14, 2022
83b7783
Merge branch 'persist' of github.com:adrinjalali/skops into adrinjala…
BenjaminBossan Sep 14, 2022
9076d36
Add support for partial (#9)
BenjaminBossan Sep 14, 2022
9bae369
fix array shapes for object arrays
adrinjalali Sep 14, 2022
c2634ce
Unit tests check the __dict__ of the estimator (#10)
BenjaminBossan Sep 14, 2022
244e302
fix builtin comparison
adrinjalali Sep 14, 2022
5a3777c
Merge branch 'persist' of github.com:adrinjalali/skops into persist
adrinjalali Sep 14, 2022
5a2cb62
fix PatchExtractor test
adrinjalali Sep 14, 2022
f9ce808
Add more metainfo to persisted data (#6)
BenjaminBossan Sep 14, 2022
539917e
Persist add missing estimator tests (#11)
BenjaminBossan Sep 14, 2022
e014ed0
fix dict save issues
adrinjalali Sep 14, 2022
2a321e6
Merge branch 'persist' of github.com:adrinjalali/skops into persist
adrinjalali Sep 14, 2022
9ea010e
role back dict and only save key types
adrinjalali Sep 14, 2022
23a1bd6
fix maskedarray
adrinjalali Sep 14, 2022
65e5a7e
add slice
adrinjalali Sep 14, 2022
3e181fd
remove empty file
adrinjalali Sep 14, 2022
0ffe648
don't pop keys
adrinjalali Sep 15, 2022
ff109a5
Move generic_get_state/instance to _general.py (#12)
BenjaminBossan Sep 15, 2022
7266673
move maskedarray implementation to have ndarrays together
adrinjalali Sep 15, 2022
45e8e7b
Merge branch 'persist' of github.com:adrinjalali/skops into persist
adrinjalali Sep 15, 2022
1bfffc9
Add support for BallTree, BinaryTree (#8)
BenjaminBossan Sep 15, 2022
18f43f6
Explicitly handle unsupported types (#13)
BenjaminBossan Sep 15, 2022
b39836a
Merge branch 'main' into persist
adrinjalali Sep 15, 2022
454b944
Loosen tolerance for comparing array values
BenjaminBossan Sep 15, 2022
f00a6b7
Merge branch 'persist' of github.com:adrinjalali/skops into adrinjala…
BenjaminBossan Sep 15, 2022
7fda8ae
Debugging CI: add sleep between tests
BenjaminBossan Sep 16, 2022
bc39610
Debug CI: Testing with dummy function
BenjaminBossan Sep 16, 2022
d004a09
Debug CI: Testing with dummy function 2
BenjaminBossan Sep 16, 2022
fd5d8c6
Debug CI: Testing with dummy function 3
BenjaminBossan Sep 16, 2022
8a50012
Debug CI: Testing with dummy function 4
BenjaminBossan Sep 16, 2022
e3d046e
Debug CI: Testing with dummy function 5
BenjaminBossan Sep 16, 2022
70b28cd
Debug CI: Testing with dummy function 6
BenjaminBossan Sep 16, 2022
5730b0d
Debug CI: Testing with dummy function 7
BenjaminBossan Sep 16, 2022
38b5760
Debug CI: Testing with dummy function 8
BenjaminBossan Sep 16, 2022
aca75b0
Debug CI: Testing with dummy function 9
BenjaminBossan Sep 16, 2022
7a1d112
Debug CI: Testing with dummy function 10
BenjaminBossan Sep 16, 2022
27b5424
Roll back test changes, use pytest tmp_path
BenjaminBossan Sep 16, 2022
64859a1
Reduce tolerance for allclose on MacOS
BenjaminBossan Sep 16, 2022
47988aa
Loosen tolerance for non-MacOS platforms
BenjaminBossan Sep 16, 2022
8d878ed
Loosen tolerance for non-MacOS platforms even more
BenjaminBossan Sep 16, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/build-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
runs-on: ${{ matrix.os }}
if: "github.repository == 'skops-dev/skops'"
strategy:
fail-fast: true
fail-fast: false # need to see which ones fail
matrix:
os: [ubuntu-latest, windows-latest, macos-latest]
python: ["3.7", "3.8", "3.9", "3.10.6"]
Expand Down
3 changes: 3 additions & 0 deletions skops/io/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from ._persist import load, save

__all__ = ["load", "save"]
258 changes: 258 additions & 0 deletions skops/io/_general.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,258 @@
import json
from functools import partial
from types import FunctionType

import numpy as np

from ._utils import _get_instance, _get_state, _import_obj, get_module, gettype
from .exceptions import UnsupportedTypeException


def dict_get_state(obj, dst):
res = {
"__class__": obj.__class__.__name__,
"__module__": get_module(type(obj)),
}

key_types = _get_state([type(key) for key in obj.keys()], dst)
content = {}
for key, value in obj.items():
if isinstance(value, property):
continue
if np.isscalar(key) and hasattr(key, "item"):
# convert numpy value to python object
key = key.item()
content[key] = _get_state(value, dst)
res["content"] = content
res["key_types"] = key_types
return res


def dict_get_instance(state, src):
content = gettype(state)()
key_types = _get_instance(state["key_types"], src)
for k_type, item in zip(key_types, state["content"].items()):
content[k_type(item[0])] = _get_instance(item[1], src)
return content


def list_get_state(obj, dst):
res = {
"__class__": obj.__class__.__name__,
"__module__": get_module(type(obj)),
}
content = []
for value in obj:
content.append(_get_state(value, dst))
res["content"] = content
return res


def list_get_instance(state, src):
content = gettype(state)()
for value in state["content"]:
content.append(_get_instance(value, src))
return content


def tuple_get_state(obj, dst):
res = {
"__class__": obj.__class__.__name__,
"__module__": get_module(type(obj)),
}
content = ()
for value in obj:
content += (_get_state(value, dst),)
res["content"] = content
return res


def tuple_get_instance(state, src):
# Returns a tuple or a namedtuple instance.
def isnamedtuple(t):
# This is needed since namedtuples need to have the args when
# initialized.
b = t.__bases__
if len(b) != 1 or b[0] != tuple:
return False
f = getattr(t, "_fields", None)
if not isinstance(f, tuple):
return False
return all(type(n) == str for n in f)

cls = gettype(state)

content = tuple()
for value in state["content"]:
content += (_get_instance(value, src),)

if isnamedtuple(cls):
return cls(*content)
return content


def function_get_state(obj, dst):
res = {
"__class__": obj.__class__.__name__,
"__module__": get_module(obj),
"content": {
"module_path": get_module(obj),
"function": obj.__name__,
},
}
return res


def function_get_instance(obj, src):
loaded = _import_obj(obj["content"]["module_path"], obj["content"]["function"])
return loaded


def partial_get_state(obj, dst):
_, _, (func, args, kwds, namespace) = obj.__reduce__()
res = {
"__class__": "partial", # don't allow any subclass
"__module__": get_module(type(obj)),
"content": {
"func": _get_state(func, dst),
"args": _get_state(args, dst),
"kwds": _get_state(kwds, dst),
"namespace": _get_state(namespace, dst),
},
}
return res


def partial_get_instance(obj, src):
content = obj["content"]
func = _get_instance(content["func"], src)
args = _get_instance(content["args"], src)
kwds = _get_instance(content["kwds"], src)
namespace = _get_instance(content["namespace"], src)
instance = partial(func, *args, **kwds) # always use partial, not a subclass
instance.__setstate__((func, args, kwds, namespace))
return instance


def type_get_state(obj, dst):
# To serialize a type, we first need to set the metadata to tell that it's
# a type, then store the type's info itself in the content field.
res = {
"__class__": obj.__class__.__name__,
"__module__": get_module(type(obj)),
"content": {
"__class__": obj.__name__,
"__module__": get_module(obj),
},
}
return res


def type_get_instance(obj, src):
loaded = _import_obj(obj["content"]["__module__"], obj["content"]["__class__"])
return loaded


def slice_get_state(obj, dst):
res = {
"__class__": obj.__class__.__name__,
"__module__": get_module(type(obj)),
"content": {
"start": obj.start,
"stop": obj.stop,
"step": obj.step,
},
}
return res


def slice_get_instance(obj, src):
start = obj["content"]["start"]
stop = obj["content"]["stop"]
step = obj["content"]["step"]
return slice(start, stop, step)


def object_get_state(obj, dst):
# This method is for objects which can either be persisted with json, or
# the ones for which we can get/set attributes through
# __getstate__/__setstate__ or reading/writing to __dict__.
try:
# if we can simply use json, then we're done.
return json.dumps(obj)
except Exception:
pass

res = {
"__class__": obj.__class__.__name__,
"__module__": get_module(type(obj)),
}

# __getstate__ takes priority over __dict__, and if non exist, we only save
# the type of the object, and loading would mean instantiating the object.
if hasattr(obj, "__getstate__"):
attrs = obj.__getstate__()
elif hasattr(obj, "__dict__"):
attrs = obj.__dict__
else:
return res

content = _get_state(attrs, dst)
# it's sufficient to store the "content" because we know that this dict can
# only have str type keys
res["content"] = content
return res


def object_get_instance(state, src):
try:
return json.loads(state)
except Exception:
pass

cls = gettype(state)

# Instead of simply constructing the instance, we use __new__, which
# bypasses the __init__, and then we set the attributes. This solves
# the issue of required init arguments.
instance = cls.__new__(cls)

content = state.get("content", {})
if not len(content):
return instance

attrs = _get_instance(content, src)
if hasattr(instance, "__setstate__"):
instance.__setstate__(attrs)
else:
instance.__dict__.update(attrs)

return instance


def unsupported_get_state(obj, dst):
raise UnsupportedTypeException(obj)


# tuples of type and function that gets the state of that type
GET_STATE_DISPATCH_FUNCTIONS = [
(dict, dict_get_state),
(list, list_get_state),
(tuple, tuple_get_state),
(slice, slice_get_state),
(FunctionType, function_get_state),
(partial, partial_get_state),
(type, type_get_state),
(object, object_get_state),
]
# tuples of type and function that creates the instance of that type
GET_INSTANCE_DISPATCH_FUNCTIONS = [
(dict, dict_get_instance),
(list, list_get_instance),
(tuple, tuple_get_instance),
(slice, slice_get_instance),
(FunctionType, function_get_instance),
(partial, partial_get_instance),
(type, type_get_instance),
(object, object_get_instance),
]
Loading