`ENH` trusting scipy.ufuncs #295

omar-araboghli · 2023-02-06T12:02:53Z

(Partially) closes #224

Q1: Is it sufficient to only trust ufuncs under scipy.special ? As for the current scipy version, public ufuncs are only visible in scipy.special. Nevertheless, for future scipy versions (and also numpy), we can either recursively traverse scipy looking for public ufuncs or limit our search to the first submodules level. E.g.:

# pseudo-code: recursively check all submodules. Warning: this needs proper handling to avoid RecursionError
for submodule in get_submodules(scipy, max_level=-1):
    look_for_type(submodule, np.func)

# pseudo-code: first submodule level
for submodule in get_submodules(scipy, max_level=1):
    look_for_type(submodule, np.func)

Q2: Does test_can_trust_ufuncs() make sense in addition to extending test_can_persist_fitted() ?
Q3: Why wouldn't we include current public numpy ufuncs the same way we do for scipy, even when numpy 2.0 not there yet ?

…nd embedding the test within estimators_fitted.

omar-araboghli · 2023-02-11T16:18:53Z

@BenjaminBossan kindly pinging you 👆🏻

BenjaminBossan

Thanks a lot for your addition. Overall, this already looks quite good, but I have a few comments, so please take a look. As to your questions:

Q1: Is it sufficient to only trust ufuncs under scipy.special ?

At the very least, this is a good start. If we miss something, we can add it later. Did you find any ufuncs that are not included with this approach?

I think in general, we could ask if we only want to add ufuncs or if we might not trust other scipy functions by default, like some of the scipy.stats functions. But let's leave that for later.

Q2: Does test_can_trust_ufuncs() make sense in addition to extending test_can_persist_fitted() ?

I can see how it might come across as a bit of redundant, but I still think it's better to have if than not to have it.

If you look at the other tests, they almost always test some kind of sklearn estimator that contains the object in question. So here, we could use a FunctionTransformer to wrap the ufunc (similar to how test_can_persist_fitted does it). However, it's probably not possible to fit that transformer generically, so the FunctionTransformer would need to stay unfitted, which makes this extra step add little value.

Q3: Why wouldn't we include current public numpy ufuncs the same way we do for scipy, even when numpy 2.0 not there yet?

We can certainly explore this and check if it's feasible and if we're missing a lot. I would recommend to do that in a separate PR though.

BenjaminBossan · 2023-02-13T10:24:34Z

skops/io/tests/test_persist.py

+def test_can_trust_ufuncs(ufunc):
+    dumped = dumps(ufunc)
+    untrusted_types = get_untrusted_types(data=dumped)
+    assert not any(type_ in SCIPY_UFUNC_TYPE_NAMES for type_ in untrusted_types)


Can we not be sure here that untrusted_types is always the empty list?

That's true. I changed it accordingly.

BenjaminBossan · 2023-02-13T10:24:56Z

docs/changes.rst

@@ -1,4 +1,4 @@
-.. include:: _authors.rst
+ds.. include:: _authors.rst


Is this addition on purpose?

Of course not 😮 I even couldn't catch it on the github changes view... Thanks!

BenjaminBossan · 2023-02-13T10:27:39Z

skops/io/_general.py

@@ -212,7 +216,7 @@ def _get_function_name(self) -> str:
        )

    def get_unsafe_set(self) -> set[str]:
-        if self.trusted is True:
+        if self.trusted is True or self._get_function_name() in self.trusted:


Just my personal preference, but operator precedence can sometimes be tricky.

Suggested change

if self.trusted is True or self._get_function_name() in self.trusted:

if (self.trusted is True) or (self._get_function_name() in self.trusted):

BenjaminBossan · 2023-02-13T10:32:43Z

skops/io/tests/test_persist.py

+    for ufunc_name in SCIPY_UFUNC_TYPE_NAMES:
+        parts = ufunc_name.split(".")
+        module_name = ".".join(parts[:-1])
+        ufunc_name = parts[-1]
+
+        yield gettype(module_name=module_name, cls_or_func=ufunc_name)


Suggested change

for ufunc_name in SCIPY_UFUNC_TYPE_NAMES:

parts = ufunc_name.split(".")

module_name = ".".join(parts[:-1])

ufunc_name = parts[-1]

yield gettype(module_name=module_name, cls_or_func=ufunc_name)

for full_name in SCIPY_UFUNC_TYPE_NAMES:

module_name, _, ufunc_name = full_name.rpartition(".")

yield gettype(module_name=module_name, cls_or_func=ufunc_name)

Just a suggestion to make the code more concise ;)

omar-araboghli · 2023-02-13T17:06:51Z

Thanks for the review @BenjaminBossan and your fair answers! Just committed your suggestions/catches.

To your question:

Did you find any ufuncs that are not included with this approach?

Nope. I think we are covering everything public under scipy.special as we wanted!

BenjaminBossan

Thanks, this looks almost good to go. Just a small nit left.

docs/changes.rst

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

BenjaminBossan

Great work, thanks.

omar-araboghli added 4 commits February 6, 2023 12:18

skops-dev#224: getting {numpy,scipy}.ufuncs with different versions

895f84a

skops-dev#224: trusting scipy.special ufuncs; updating FunctionNode a…

3abf300

…nd embedding the test within estimators_fitted.

Merge branch 'main' into numpy-scipy-unfuncs-to-trusted-list

937bba1

skops-dev#224: testing persisting all scipy_ufuncs; changelog updated

2423b8f

omar-araboghli changed the title ~~ENH trusting {numpy,scipy}.ufuncs~~ ENH trusting scipy.ufuncs Feb 11, 2023

omar-araboghli marked this pull request as ready for review February 11, 2023 16:16

BenjaminBossan requested changes Feb 13, 2023

View reviewed changes

skops-dev#224: applying code-review remarks.

853d67d

omar-araboghli requested a review from BenjaminBossan February 13, 2023 17:07

BenjaminBossan reviewed Feb 14, 2023

View reviewed changes

docs/changes.rst Outdated Show resolved Hide resolved

Update docs/changes.rst

ad89ae3

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

omar-araboghli requested a review from BenjaminBossan February 14, 2023 11:19

BenjaminBossan approved these changes Feb 14, 2023

View reviewed changes

BenjaminBossan merged commit 3e1f138 into skops-dev:main Feb 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ENH` trusting scipy.ufuncs #295

`ENH` trusting scipy.ufuncs #295

omar-araboghli commented Feb 6, 2023 •

edited

Loading

omar-araboghli commented Feb 11, 2023

BenjaminBossan left a comment •

edited

Loading

BenjaminBossan Feb 13, 2023

omar-araboghli Feb 13, 2023

BenjaminBossan Feb 13, 2023

omar-araboghli Feb 13, 2023

BenjaminBossan Feb 13, 2023

omar-araboghli Feb 13, 2023

BenjaminBossan Feb 13, 2023

omar-araboghli Feb 13, 2023

omar-araboghli commented Feb 13, 2023

BenjaminBossan left a comment

BenjaminBossan left a comment

		@@ -1,4 +1,4 @@
		.. include:: _authors.rst
		ds.. include:: _authors.rst

	if self.trusted is True or self._get_function_name() in self.trusted:
	if (self.trusted is True) or (self._get_function_name() in self.trusted):

ENH trusting scipy.ufuncs #295

ENH trusting scipy.ufuncs #295

Conversation

omar-araboghli commented Feb 6, 2023 • edited Loading

omar-araboghli commented Feb 11, 2023

BenjaminBossan left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

omar-araboghli commented Feb 13, 2023

BenjaminBossan left a comment

Choose a reason for hiding this comment

BenjaminBossan left a comment

Choose a reason for hiding this comment

`ENH` trusting scipy.ufuncs #295

`ENH` trusting scipy.ufuncs #295

omar-araboghli commented Feb 6, 2023 •

edited

Loading

BenjaminBossan left a comment •

edited

Loading