-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MNT: Update repo URL of sklearn nightly build, fix sparse matrix persistence issue #375
MNT: Update repo URL of sklearn nightly build, fix sparse matrix persistence issue #375
Conversation
I investigated further and the issue is not directly linked to sklearn, but to scipy. Sparse matrices in the scipy version used by sklearn nightly have an interesting behavior, where two attributes only exists after they are dynamically created by accessing another attribute: >>> from scipy import sparse
>>> x = sparse.csr_matrix((3, 4))
>>> x._has_canonical_format
AttributeError: 'csr_matrix' object has no attribute '_has_canonical_format'
>>> x._has_sorted_indices
AttributeError: 'csr_matrix' object has no attribute '_has_sorted_indices'
>>> x.has_canonical_format # this creates the missing attributes
True
>>> x._has_canonical_format
True
>>> x._has_sorted_indices
True (code) Our code fails because the original estimator will have those two attributes but after a save/load roundtrip, they no longer exist: >>> from skops import io as sio
>>> y = sio.loads(sio.dumps(x), trusted=True)
y._has_canonical_format
AttributeError: 'csr_matrix' object has no attribute '_has_canonical_format' The same error does not happen with >>> import pickle
>>> z = pickle.loads(pickle.dumps(x))
>>> z._has_canonical_format
True Therefore, I would assume that scipy devs won't fix the issue. @skops-dev/maintainers What's your opinion on what to do? We could special case these attributes in our tests. In practice, the fact that they "vanish" after a roundtrip is probably irrelevant, as they are private. Alternatively, we could try to figure out why those attributes vanish when using skops but not when using pickle. |
Hmm, I'm really stumped about this error. First of all, even though pickle does not have the problem, saving and loading with scipy (which skops uses under the hood) has the same issue: >>> from scipy.sparse import load_npz, save_npz
>>> from scipy import sparse
>>> x = sparse.csr_matrix((3, 4))
>>> x.has_canonical_format
True
>>> x._has_canonical_format
True
>>> save_npz('/tmp/sparse_matrix.npz', x)
>>> y = load_npz('/tmp/sparse_matrix.npz')
>>> y._has_canonical_format
AttributeError: 'csr_matrix' object has no attribute '_has_canonical_format' So in a way it's already broken in scipy, but probably it's not considered a problem there, since it's a private attribute. What surprises me, however, is that the issue only occurs now. The change in scipy is already 10 years old, so this is not caused by a new scipy version. So then I thought that there was a change in sklearn recently, but the corresponding attribute was also last touched a long time ago (5 years), therefore, this is also not an explanation. The most likely explanation has thus to be some change in Anyway, I'll push a solution that will ensure that the private attributes are present on the loaded object iff they were present on the original object. This is achieved by remembering if the attribute was there when storing the state. If it was, when loading the object, |
Okay, CI is green, ready for review @skops-dev/maintainers. Until this, or a different fix, is merged, CI will be red for everyone :/ |
This to me is similar to the flags issue we had with numpy, i.e. it doesn't affect the actual values in the matrix, and only gives an information about how it's stored. I think we can safely just ignore this attribute instead of trying to save it. |
So the suggestion would be to add a special case to the tests to explicitly remove those two arguments? |
yeah I'd just remove them. |
Instead of restoring the private attributes after loading a sparse matrix, just don't test for these attributes. They are not restored, but that shouldn't affect users.
e0c098a
to
cb7d100
Compare
@adrinjalali pls take another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Its not related to this commit
during pytest i am recieving variable declare errors , modules not installed errors like |
In order to run hugging face related commands, you need to have your token present in The |
See this comment.