-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH
trusting scipy.ufuncs
#295
ENH
trusting scipy.ufuncs
#295
Conversation
…nd embedding the test within estimators_fitted.
ENH
trusting {numpy,scipy}.ufuncsENH
trusting scipy.ufuncs
@BenjaminBossan kindly pinging you 👆🏻 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your addition. Overall, this already looks quite good, but I have a few comments, so please take a look. As to your questions:
Q1: Is it sufficient to only trust ufuncs under scipy.special ?
At the very least, this is a good start. If we miss something, we can add it later. Did you find any ufuncs that are not included with this approach?
I think in general, we could ask if we only want to add ufuncs or if we might not trust other scipy functions by default, like some of the scipy.stats
functions. But let's leave that for later.
Q2: Does test_can_trust_ufuncs() make sense in addition to extending test_can_persist_fitted() ?
I can see how it might come across as a bit of redundant, but I still think it's better to have if than not to have it.
If you look at the other tests, they almost always test some kind of sklearn estimator that contains the object in question. So here, we could use a FunctionTransformer
to wrap the ufunc (similar to how test_can_persist_fitted
does it). However, it's probably not possible to fit that transformer generically, so the FunctionTransformer
would need to stay unfitted, which makes this extra step add little value.
Q3: Why wouldn't we include current public numpy ufuncs the same way we do for scipy, even when numpy 2.0 not there yet?
We can certainly explore this and check if it's feasible and if we're missing a lot. I would recommend to do that in a separate PR though.
skops/io/tests/test_persist.py
Outdated
def test_can_trust_ufuncs(ufunc): | ||
dumped = dumps(ufunc) | ||
untrusted_types = get_untrusted_types(data=dumped) | ||
assert not any(type_ in SCIPY_UFUNC_TYPE_NAMES for type_ in untrusted_types) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not be sure here that untrusted_types
is always the empty list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. I changed it accordingly.
docs/changes.rst
Outdated
@@ -1,4 +1,4 @@ | |||
.. include:: _authors.rst | |||
ds.. include:: _authors.rst |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this addition on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course not 😮 I even couldn't catch it on the github changes view... Thanks!
skops/io/_general.py
Outdated
@@ -212,7 +216,7 @@ def _get_function_name(self) -> str: | |||
) | |||
|
|||
def get_unsafe_set(self) -> set[str]: | |||
if self.trusted is True: | |||
if self.trusted is True or self._get_function_name() in self.trusted: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just my personal preference, but operator precedence can sometimes be tricky.
if self.trusted is True or self._get_function_name() in self.trusted: | |
if (self.trusted is True) or (self._get_function_name() in self.trusted): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adapted!
skops/io/tests/test_persist.py
Outdated
for ufunc_name in SCIPY_UFUNC_TYPE_NAMES: | ||
parts = ufunc_name.split(".") | ||
module_name = ".".join(parts[:-1]) | ||
ufunc_name = parts[-1] | ||
|
||
yield gettype(module_name=module_name, cls_or_func=ufunc_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for ufunc_name in SCIPY_UFUNC_TYPE_NAMES: | |
parts = ufunc_name.split(".") | |
module_name = ".".join(parts[:-1]) | |
ufunc_name = parts[-1] | |
yield gettype(module_name=module_name, cls_or_func=ufunc_name) | |
for full_name in SCIPY_UFUNC_TYPE_NAMES: | |
module_name, _, ufunc_name = full_name.rpartition(".") | |
yield gettype(module_name=module_name, cls_or_func=ufunc_name) |
Just a suggestion to make the code more concise ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adapted!
Thanks for the review @BenjaminBossan and your fair answers! Just committed your suggestions/catches. To your question:
Nope. I think we are covering everything public under |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks almost good to go. Just a small nit left.
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, thanks.
(Partially) closes #224
scipy.special
? As for the currentscipy
version, public ufuncs are only visible inscipy.special
. Nevertheless, for futurescipy
versions (and alsonumpy
), we can either recursively traversescipy
looking for public ufuncs or limit our search to the first submodules level. E.g.:test_can_trust_ufuncs()
make sense in addition to extendingtest_can_persist_fitted()
?numpy
ufuncs the same way we do forscipy
, even whennumpy
2.0 not there yet ?