Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UntrustedTypesFoundException raised for standard library usage in lightgbm's Booster #432

Closed
thesnapdragon opened this issue Jul 8, 2024 · 6 comments · Fixed by #433
Closed

Comments

@thesnapdragon
Copy link

thesnapdragon commented Jul 8, 2024

Hello there,

after updating to 0.10.0 we got the following error:

...
File "/app/sales_prediction/common/result_storage.py", line 74, in get_train_result
return TrainResult(booster=skops.io.loads(model_data, trusted=["lightgbm.basic.Booster"]))
File "/opt/pysetup/.venv/lib/python3.10/site-packages/skops/io/_persist.py", line 191, in loads
audit_tree(tree)
File "/opt/pysetup/.venv/lib/python3.10/site-packages/skops/io/_audit.py", line 59, in audit_tree
raise UntrustedTypesFoundException(unsafe)
skops.io.exceptions.UntrustedTypesFoundException: Untrusted types found in the file: ['collections.defaultdict'].

In the serialised object we only have a single lightgbm.basic.Booster object.

  1. Should the standard library be untrusted?
  2. In this case the defaultdict is an internal details in the lightgbm's Booster. When the Booster is marked as trusted, should the internally used subdependencies still be untrusted?
@adrinjalali
Copy link
Member

collections.defaultdict sounds reasonable to be trusted by default. Could you please provide a minimal reproducer for me to test and fix?

@thesnapdragon
Copy link
Author

thesnapdragon commented Jul 8, 2024

Thank you for the quick reply, here is a minimal example with lightgbm:

>>> import pandas as pd
>>> import lightgbm
>>> import skops.io

>>> booster = lightgbm.train(train_set=lightgbm.Dataset(pd.DataFrame({"results": [1]})), params={})
>>> serialised_data = skops.io.dumps(booster)
>>> skops.io.loads(serialised_data, trusted=["lightgbm.basic.Booster"])

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/unicsovicsm/.pyenv/versions/sales-prediction-services/lib/python3.10/site-packages/skops/io/_persist.py", line 191, in loads
    audit_tree(tree)
  File "/Users/unicsovicsm/.pyenv/versions/sales-prediction-services/lib/python3.10/site-packages/skops/io/_audit.py", line 59, in audit_tree
    raise UntrustedTypesFoundException(unsafe)
skops.io.exceptions.UntrustedTypesFoundException: Untrusted types found in the file: ['collections.defaultdict'].

For my understanding did I understand it correctly, that the trusted objects are recursively checked further for untrusted data?

@adrinjalali
Copy link
Member

Thanks for the reproducer.

For my understanding did I understand it correctly, that the trusted objects are recursively checked further for untrusted data?

Yes, trusting a "parent" type doesn't mean children included in that type are all trusted.

@juhoinkinen
Copy link

I encountered a bug(?) regarding defaultdict: the default value is not being saved (or loaded).

import collections
import skops.io as sio


class MyClass:
    def __init__(self):
        self.default_dict = collections.defaultdict(set)
        self.default_dict['key-0'] = set(['foo'])


my_obj = MyClass()
print('my_obj: ', my_obj.default_dict)
sio.dump(my_obj, 'dump.skops')

my_obj_loaded = sio.load('dump.skops', trusted=['__main__.MyClass', 'collections.defaultdict'])

print('my_obj_loaded: ', my_obj_loaded.default_dict)  # There is no default value
my_obj_loaded.default_dict['key-1']  # So this line raises "KeyError: 'key-1'"

The output of this code is

my_obj:  defaultdict(<class 'set'>, {'key-0': {'foo'}})
my_obj_loaded:  defaultdict(None, {'key-0': {'foo'}})
Traceback (most recent call last):
  File "/home/local/jmminkin/tmp/poc.py", line 18, in <module>
    my_obj_loaded.default_dict['key-1']  # So this line raises "KeyError: 'key-1'"
    ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
KeyError: 'key-1'

This occurs also when using also list, tuple, int or a lambda as the default value, not only when using set.

I used skops.io 0.10.0 on Python 3.12.4.

@adrinjalali
Copy link
Member

So one thing: I checked and this is not a regression really, with the latest lightgbm, skops=0.9 also fails with the same error. But I'll see what we can do here.

@thesnapdragon
Copy link
Author

thesnapdragon commented Jul 11, 2024

Yes, it seems not the new version introduced the issue. Version 0.9.0 could work if you used trusted as a boolean flag:

skops.io.loads(serialised_data, trusted=True)

But this is not possible since 0.10.0

Anyway thank you for the quick fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants