Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH trusting scipy.ufuncs #295

Conversation

omar-araboghli
Copy link
Contributor

@omar-araboghli omar-araboghli commented Feb 6, 2023

(Partially) closes #224

  • Q1: Is it sufficient to only trust ufuncs under scipy.special ? As for the current scipy version, public ufuncs are only visible in scipy.special. Nevertheless, for future scipy versions (and also numpy), we can either recursively traverse scipy looking for public ufuncs or limit our search to the first submodules level. E.g.:
# pseudo-code: recursively check all submodules. Warning: this needs proper handling to avoid RecursionError
for submodule in get_submodules(scipy, max_level=-1):
    look_for_type(submodule, np.func)

# pseudo-code: first submodule level
for submodule in get_submodules(scipy, max_level=1):
    look_for_type(submodule, np.func)
  • Q2: Does test_can_trust_ufuncs() make sense in addition to extending test_can_persist_fitted() ?
  • Q3: Why wouldn't we include current public numpy ufuncs the same way we do for scipy, even when numpy 2.0 not there yet ?

@omar-araboghli omar-araboghli changed the title ENH trusting {numpy,scipy}.ufuncs ENH trusting scipy.ufuncs Feb 11, 2023
@omar-araboghli omar-araboghli marked this pull request as ready for review February 11, 2023 16:16
@omar-araboghli
Copy link
Contributor Author

@BenjaminBossan kindly pinging you 👆🏻

Copy link
Collaborator

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your addition. Overall, this already looks quite good, but I have a few comments, so please take a look. As to your questions:

Q1: Is it sufficient to only trust ufuncs under scipy.special ?

At the very least, this is a good start. If we miss something, we can add it later. Did you find any ufuncs that are not included with this approach?

I think in general, we could ask if we only want to add ufuncs or if we might not trust other scipy functions by default, like some of the scipy.stats functions. But let's leave that for later.

Q2: Does test_can_trust_ufuncs() make sense in addition to extending test_can_persist_fitted() ?

I can see how it might come across as a bit of redundant, but I still think it's better to have if than not to have it.

If you look at the other tests, they almost always test some kind of sklearn estimator that contains the object in question. So here, we could use a FunctionTransformer to wrap the ufunc (similar to how test_can_persist_fitted does it). However, it's probably not possible to fit that transformer generically, so the FunctionTransformer would need to stay unfitted, which makes this extra step add little value.

Q3: Why wouldn't we include current public numpy ufuncs the same way we do for scipy, even when numpy 2.0 not there yet?

We can certainly explore this and check if it's feasible and if we're missing a lot. I would recommend to do that in a separate PR though.

def test_can_trust_ufuncs(ufunc):
dumped = dumps(ufunc)
untrusted_types = get_untrusted_types(data=dumped)
assert not any(type_ in SCIPY_UFUNC_TYPE_NAMES for type_ in untrusted_types)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not be sure here that untrusted_types is always the empty list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. I changed it accordingly.

docs/changes.rst Outdated
@@ -1,4 +1,4 @@
.. include:: _authors.rst
ds.. include:: _authors.rst
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this addition on purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course not 😮 I even couldn't catch it on the github changes view... Thanks!

@@ -212,7 +216,7 @@ def _get_function_name(self) -> str:
)

def get_unsafe_set(self) -> set[str]:
if self.trusted is True:
if self.trusted is True or self._get_function_name() in self.trusted:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my personal preference, but operator precedence can sometimes be tricky.

Suggested change
if self.trusted is True or self._get_function_name() in self.trusted:
if (self.trusted is True) or (self._get_function_name() in self.trusted):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adapted!

Comment on lines 225 to 230
for ufunc_name in SCIPY_UFUNC_TYPE_NAMES:
parts = ufunc_name.split(".")
module_name = ".".join(parts[:-1])
ufunc_name = parts[-1]

yield gettype(module_name=module_name, cls_or_func=ufunc_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for ufunc_name in SCIPY_UFUNC_TYPE_NAMES:
parts = ufunc_name.split(".")
module_name = ".".join(parts[:-1])
ufunc_name = parts[-1]
yield gettype(module_name=module_name, cls_or_func=ufunc_name)
for full_name in SCIPY_UFUNC_TYPE_NAMES:
module_name, _, ufunc_name = full_name.rpartition(".")
yield gettype(module_name=module_name, cls_or_func=ufunc_name)

Just a suggestion to make the code more concise ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adapted!

@omar-araboghli
Copy link
Contributor Author

Thanks for the review @BenjaminBossan and your fair answers! Just committed your suggestions/catches.

To your question:

Did you find any ufuncs that are not included with this approach?

Nope. I think we are covering everything public under scipy.special as we wanted!

Copy link
Collaborator

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks almost good to go. Just a small nit left.

docs/changes.rst Outdated Show resolved Hide resolved
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Copy link
Collaborator

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks.

@BenjaminBossan BenjaminBossan merged commit 3e1f138 into skops-dev:main Feb 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add {numpy, scipy}.ufuncs to the trusted list
2 participants