-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow pickle when loading numpy array file #836
Conversation
Co-authored-by: tokaessm
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #836 +/- ##
==========================================
- Coverage 81.98% 81.91% -0.07%
==========================================
Files 159 159
Lines 10375 10375
==========================================
- Hits 8506 8499 -7
- Misses 1869 1876 +7
|
Hi @tomglk, thank you for contributing this. Are you able to share why you need to serialise/deserialise |
Hi @ascillitoe , We store strings, that seems to be the problem. This is a minimal example that breaks: data = pd.DataFrame(
columns=["c1", "c2", "c3"],
data=[
["42", 1, datetime.now().timestamp()],
["42", 2, datetime.now().timestamp()],
["42", 3, datetime.now().timestamp()],
]
)
detector = TabularDrift(
x_ref=data.to_numpy(),
p_val=0.05,
x_ref_preprocessed=True
)
save_detector(
detector,
"detector"
)
loaded_detector = load_detector(
"detector"
) Changing "42" to 42 fixes it. It also breaks when you remove the Edit: |
New commit does not set the value to true all the time (because that is unnecessary, if you do not have the problem). Instead, the value can now be controlled via parameter. Intended usage: loaded_detector = load_detector(
"detector",
enable_unsafe_loading=True
) This way users who actually have the problem do not have to hack the np.load()-function themselves. But they are forced to make a conscious and informed decision (CVE you mentioned is linked in docstrings). Tests: |
Merge branch 'master' of github.com:SeldonIO/alibi-detect
Hi @ascillitoe, I added some tests for load_detector(). Also demonstrating that the problem occurs and can be prevented with the new parameter. What do you think? Is there anything else that is missing or could the code be merged like this? |
@tomglk apologies for the late reply. We were initially hesitant about this change as it essentially means implicitly supporting data types in |
Hi, we had had problems loading a detector which we stored using save_detector.
When we want to load it using load_detector, we get the error
value error: 'Object arrays cannot be loaded when allow_pickle=False'
Currently we apply a hack of overwriting
np.load
with the parameterallow_pickle=True
in order to be able to use the loading mechanism. (https://stackoverflow.com/a/56243777)However, we think that it makes sense to set this param in the loading function itself.
If you have any questions, feel free to ask.
Any tips on how to solve this issue with other ways are also more than welcome. :)
Thanks, Tobi & Tom
Co-authored-by: tokaessm