MNT add python=3.11 support #246

adrinjalali · 2022-12-09T12:35:54Z

Fixes #245

adrinjalali · 2022-12-09T14:46:56Z

This raises this FutureWarning, which is odd, cause I can't reproduce it locally.

skops/hub_utils/tests/test_hf_hub.py:41: in <module>
    iris = load_iris(as_frame=True, return_X_y=False)
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/sklearn/datasets/_base.py:636: in load_iris
    data, target, target_names, fdescr = load_csv_data(
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/sklearn/datasets/_base.py:318: in load_csv_data
    with resources.open_text(data_module, data_file_name) as csv_file:
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/resources/_legacy.py:18: in wrapper
    warnings.warn(
E   DeprecationWarning: open_text is deprecated. Use files() instead. Refer to https://importlib-resources.readthedocs.io/en/latest/using.html#migrating-from-legacy for migration advice.

It seems legit, but not sure why my local environment doesn't reproduce it.

Any ideas, would you have an idea @jeremiedbb @jjerphan? This fails on our nightly build.

jjerphan · 2022-12-09T15:48:38Z

Hi @adrinjalali, importlib.ressources.open_text is deprecated since python 3.11, hence the error.

Can you report your packages versions using sklearn.show_versions?

adrinjalali · 2022-12-09T15:51:13Z

I know @jjerphan , I have a py311 local env. But I'm not getting the warning locally.

Locally, I have:

System:
    python: 3.11.0 | packaged by conda-forge | (main, Oct 25 2022, 06:24:40) [GCC 10.4.0]
executable: /home/adrin/miniforge3/envs/py311/bin/python
   machine: Linux-6.0.11-arch1-1-x86_64-with-glibc2.36

Python dependencies:
      sklearn: 1.3.dev0
          pip: 22.3.1
   setuptools: 65.5.1
        numpy: 1.23.5
        scipy: 1.10.0.dev0
       Cython: None
       pandas: 1.5.2
   matplotlib: None
       joblib: 1.2.0
threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
         prefix: libgomp
       filepath: /home/adrin/miniforge3/envs/py311/lib/python3.11/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None
    num_threads: 12

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/adrin/miniforge3/envs/py311/lib/libopenblasp-r0.3.21.so
        version: 0.3.21
threading_layer: pthreads
   architecture: Haswell
    num_threads: 12

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/adrin/miniforge3/envs/py311/lib/python3.11/site-packages/scipy.libs/libopenblasp-r0-41284840.3.18.so
        version: 0.3.18
threading_layer: pthreads
   architecture: Haswell
    num_threads: 12

And on the CI, we have:

Python 3.11.0
pip 22.3.1 from /opt/hostedtoolcache/Python/3.11.0/x64/lib/python3.11/site-packages/pip (python 3.11)
Package                       Version
----------------------------- ---------
alabaster                     0.7.12
attrs                         22.1.0
Babel                         2.11.0
black                         22.6.0
certifi                       2022.12.7
charset-normalizer            2.1.1
click                         8.1.3
contourpy                     1.0.6
coverage                      6.5.0
cycler                        0.11.0
docutils                      0.17.1
filelock                      3.8.2
flake8                        6.0.0
flaky                         3.7.0
fonttools                     4.38.0
huggingface-hub               0.11.1
idna                          3.4
imagesize                     1.4.1
iniconfig                     1.1.1
isort                         5.10.1
Jinja2                        3.1.2
joblib                        1.2.0
kiwisolver                    1.4.4
MarkupSafe                    2.1.1
matplotlib                    3.6.2
mccabe                        0.7.0
mypy                          0.981
mypy-extensions               0.4.3
numpy                         1.23.5
numpydoc                      1.5.0
packaging                     22.0
pandas                        1.5.2
pathspec                      0.10.2
Pillow                        9.3.0
pip                           22.3.1
platformdirs                  2.6.0
pluggy                        1.0.0
pycodestyle                   2.10.0
pyflakes                      3.0.1
Pygments                      2.13.0
pyparsing                     3.0.9
pytest                        7.2.0
pytest-cov                    4.0.0
python-dateutil               2.8.2
pytz                          2022.6
PyYAML                        6.0
requests                      2.28.1
scikit-learn                  1.3.dev0
scipy                         1.9.3
setuptools                    65.5.0
six                           1.16.0
skops                         0.3.dev0
snowballstemmer               2.2.0
Sphinx                        5.3.0
sphinx-gallery                0.11.1
sphinx-issues                 3.0.1
sphinx-prompt                 1.5.0
sphinx-rtd-theme              1.1.1
sphinxcontrib-applehelp       1.0.2
sphinxcontrib-devhelp         1.0.2
sphinxcontrib-htmlhelp        2.0.0
sphinxcontrib-jsmath          1.0.1
sphinxcontrib-qthelp          1.0.3
sphinxcontrib-serializinghtml 1.1.5
tabulate                      0.9.0
threadpoolctl                 3.1.0
tqdm                          4.64.1
types-requests                2.28.11.5
types-urllib3                 1.26.25.4
typing_extensions             4.4.0
urllib3                       1.26.13

jjerphan · 2022-12-09T15:56:03Z

I've opened scikit-learn/scikit-learn#25157 accordingly.

jjerphan · 2022-12-09T16:33:18Z

Probably you already have those datasets cached on your disk, hence not running the deprecated functions and thus not having the deprecation warning which itself being promoted as an error.

- Remove Python 3.7 compatibility code (requires skops-dev#246) - Removed empty section in docs - Don't add empty metrics table by default - Remove option to select with a list of str (e.g. card.select(['Foo', 'Bar'])) is no longer possible) - Add option to chain selections using select method (e.g. card.select('Foo').select('Bar'))

BenjaminBossan · 2022-12-13T13:27:09Z

@adrinjalali This warnings filter worked for me locally:

"ignore:\\w+ is deprecated. Use files\\(\\) instead:DeprecationWarning"

The key seems to be this:

message is a string containing a regular expression that the start of the warning message must match

(docs)

This always trips me as well.

Description The proposed model card implementation would allow to dynamically add sections or overwrite them. This is not a complete implementation but already covers most of the features we already have and then some. On top of these features, it would be possible to add more features like creating a default Card with placeholders, just like the exisint template, or the possibility to delete existing sections or to retrieve the result of a certain section. Implementation The underlying data structure consists of a dict and a Section dataclass. All data is stored in a _data attribute with the type dict[str, Section]. The dataclass hold the section contents, i.e. the section title, the section content, and subsections, which again have the same type. It's thus recursive data structure. Section title and dict key are identical, which is mostly for convenience. With this refactor, there are no separate data containers anymore for eval results, template sections, extra sections, etc. They are all treated the same. IMHO, this greatly simplifies the code overall. The only complex function that's left is the one needed to traverse the tree holding the data, and even that is just 14 LOC. Demo To see how the new class can be used, take a look at the main function. The resulting Card can be seen here: https://huggingface.co/skops-ci/hf_hub_example-fcc0d6fe-d072-4f94-8fdb-6bf3bb917bca * [WIP] Further align new model card design Added a test that shows that the new card produces the same output as the old card (except for a few non-deterministic parts). This includes most of the idiosyncrasies of the old card we might want to change in the future (e.g. inconsistent capitalization, use of empty lines). Some of the more problematic behaviors of the old card class were, however, fixed (e.g. creating an empty metrics table when there are no metrics). The other tests have been reworked to use the new card features to make them more precise. Often, that means that instead of having a very weak test like "assert 'foo' in card.render()", it is now possible to select the exact section and check that it equals the expected output. This work is still unfinished, specifically it still lacks tests for the card repr and for the newly added features. * Make tests pass Some refactoring to clean up things, rework repr, make repr tests pass. * Add tests for new functionalities and docstrings * Adjust tests to work with older sklearn versions * Adjust examples to new card, more docs Also some better type annotations. * Continue fixing tests * Try fixing Windows error by specifying encoding * Adjust doctest: confusion matrix not stored in cwd * Increase test coverage * Try fixing test failure on Windows * Replace old by new Card implementation * Address reviewer comments - Remove noise from docstring example - Add the comma after model repr - Add docstrings to private methods * Add TODO notes for when Python 3.7 is dropped * Add Hub model card template, add template arg Users can now choose to use no template, skops template, hub template, or their own template. Using their own template disables a lot of prefilling (say, putting the model plot in the card) because we wouldn't know where to put it. Users will need to call card.add for the otherwise prefilled sections. * Make _add_single return the Section This can be useful, because otherwise it takes a bit of effort to retrieve the latest section. * Allow tables without rows to be added It's ugly, but there is no technical reason from prohibiting the addition of tables without rows. (Note, columns are still required). This allows us to use TableSection for formatting the metrics, instead of calling tabulate there directly. This is better, since we don't have 2 separate ways of creating metrics. * Error when calling add_metric w/ invalid template * Some amendments required after merging Most notably, dynamic model loading in Cards was broken. * A couple more changes: 1. Don't use Hub template for now 2. Document how to add new default templates in code 3. Nicer error message when using invalid template * Add entry to changes.rst * Fix small bug in generated code for loading * Address reviewer comments - Remove Python 3.7 compatibility code (requires #246) - Removed empty section in docs - Don't add empty metrics table by default - Remove option to select with a list of str (e.g. card.select(['Foo', 'Bar'])) is no longer possible) - Add option to chain selections using select method (e.g. card.select('Foo').select('Bar')) * Adjust getting started code in model card * Adjust docstrings * Allow adding default sections with custom template So far, if users had a custom template (including no template at all), they would lose the possibility to add some default sections: - metrics - model plot - hyperparamters - getting started code Now we expose methods to the users to add those sections with a simple call (no need to manually format the content and use card.add). For this to work, users have to indicate the desired section, since we would otherwise not know where to put the content. During the work on this, I also cleaned up the Card tests, which became messier and messier over time. Related tests are now all contained in a class, which makes it a little bit easier to see if for a certain method, all test cases have been covered.

…ops-dev#260)" This reverts commit 1b3c674.

BenjaminBossan · 2023-01-18T12:00:30Z

I tinkered a bit and found a way to make the tests pass with Python 3.11 (starting from main branch). The patch is shown below. The most notable change is with regard to allowing __reduce__ results with 2 elements and how to deal with __getstate__ returning None. I'm not 100% my solution is the best way, pls take a look.

diff --git a/skops/io/_general.py b/skops/io/_general.py
index 10ef9a0..9bc2254 100644
--- a/skops/io/_general.py
+++ b/skops/io/_general.py
@@ -401,10 +401,11 @@ class ObjectNode(Node):
             return instance
 
         attrs = self.children["attrs"].construct()
-        if hasattr(instance, "__setstate__"):
-            instance.__setstate__(attrs)
-        else:
-            instance.__dict__.update(attrs)
+        if attrs is not None:
+            if hasattr(instance, "__setstate__"):
+                instance.__setstate__(attrs)
+            else:
+                instance.__dict__.update(attrs)
 
         return instance
 
diff --git a/skops/io/_sklearn.py b/skops/io/_sklearn.py
index 9da5392..997cc28 100644
--- a/skops/io/_sklearn.py
+++ b/skops/io/_sklearn.py
@@ -71,6 +71,9 @@ def reduce_get_state(obj: Any, save_context: SaveContext) -> dict[str, Any]:
         # reduce includes what's needed for __getstate__ and we don't need to
         # call __getstate__ directly.
         attrs = reduce[2]
+    elif len(reduce) == 2:
+        # should be a tuple of class, args
+        _, attrs = reduce
     elif hasattr(obj, "__getstate__"):
         attrs = obj.__getstate__()
     elif hasattr(obj, "__dict__"):
@@ -78,7 +81,7 @@ def reduce_get_state(obj: Any, save_context: SaveContext) -> dict[str, Any]:
     else:
         attrs = {}
 
-    if not isinstance(attrs, dict):
+    if not isinstance(attrs, (dict, tuple)):
         raise UnsupportedTypeException(
             f"Objects of type {res['__class__']} not supported yet"
         )
@@ -119,8 +122,13 @@ class ReduceNode(Node):
 
         if hasattr(instance, "__setstate__"):
             instance.__setstate__(attrs)
-        else:
+        elif isinstance(attrs, dict):
             instance.__dict__.update(attrs)
+        else:
+            # we (probably) got tuple attrs but cannot setstate with them
+            raise UnsupportedTypeException(
+                f"Objects of type {constructor} are not supported yet"
+            )
 
         return instance
 
diff --git a/skops/io/tests/_utils.py b/skops/io/tests/_utils.py
index ead14b2..bd73670 100644
--- a/skops/io/tests/_utils.py
+++ b/skops/io/tests/_utils.py
@@ -67,10 +67,13 @@ def _assert_tuples_equal(val1, val2):
 
 
 def _assert_vals_equal(val1, val2):
-    if hasattr(val1, "__getstate__"):
+    if type(val1) == type:  # e.g. could be np.int64
+        assert val1 is val2
+    elif hasattr(val1, "__getstate__") and (val1.__getstate__() is not None):
         # This includes BaseEstimator since they implement __getstate__ and
         # that returns the parameters as well.
-        #
+        # Since Python 3.11, all objects have a __getstate__ but they return
+        # None by default, in which case this check is not performed.
         # Some objects return a tuple of parameters, others a dict.
         state1 = val1.__getstate__()
         state2 = val2.__getstate__()

BenjaminBossan · 2023-01-20T11:03:18Z

@adrinjalali I'm happy with the PR and would be ready to merge it. However, I'm a bit concerned about the uncovered lines of code. I think we should investigate before proceeding.

I checked out this branch and ran it locally with Python 3.10 and 3.11. Interestingly, I could throw out a lot of code from ReduceNode._construct, ending up with:

    def _construct(self):
        args = self.children["args"].construct()
        constructor = self.children["constructor"]
        instance = constructor(*args)
        attrs = self.children["attrs"].construct()
        if not attrs:
            # nothing more to do
            return instance

        # ALL THE CODE HERE WAS DELETED

        instance.__setstate__(attrs)
        return instance

Also, locally, this line was not covered, not sure why codecov doesn't highlight that (maybe it's covered for Python < 3.10?).

adrinjalali · 2023-01-20T14:07:20Z

The reason those lines exist is to handle cases where somebody manufactures a .skops file where they pass odd values there. Since in our tests we only deal with valid files, those lines are never covered.

BenjaminBossan · 2023-01-20T14:12:34Z

The reason those lines exist is to handle cases where somebody manufactures a .skops file where they pass odd values there. Since in our tests we only deal with valid files, those lines are never covered.

Okay, then at the very least we should add a test for those manufactured states, right? Then I can merge this and an issue should be created.

BenjaminBossan

LGTM

MNT drop python=3.7 support

aef0131

jjerphan mentioned this pull request Dec 9, 2022

MAINT handle deprecations from importlib.resources scikit-learn/scikit-learn#25157

Merged

BenjaminBossan mentioned this pull request Dec 9, 2022

Alternative Card implementation #203

Merged

adrinjalali added 5 commits December 12, 2022 11:38

add futurewarning to the list of ignored ones

8a64e42

Merge remote-tracking branch 'upstream/main' into drop-37

3203725

it's a DeprecationWarning

7fd557c

change message

00a3917

try again

b371b56

adrinjalali added 2 commits December 13, 2022 14:32

use Ben's suggestion

6d63c10

Merge remote-tracking branch 'upstream/main' into drop-37

830793f

adrinjalali mentioned this pull request Dec 14, 2022

MNT drop python=3.7 support #252

Merged

adrinjalali changed the title ~~MNT drop python=3.7 support~~ MNT add python=3.11 support Dec 14, 2022

adrinjalali added 4 commits January 13, 2023 15:25

Merge remote-tracking branch 'upstream/main' into drop-37

f89c8e6

catboost doesn't support 311

e0833e4

Revert "FIX: Persistence tests failing when using sklearn nightly (sk…

d38e072

…ops-dev#260)" This reverts commit 1b3c674.

check for None

c44b339

apply [almost] Benjamin's patch

e9fe32f

adrinjalali requested a review from BenjaminBossan January 19, 2023 16:44

BenjaminBossan approved these changes Jan 23, 2023

View reviewed changes

BenjaminBossan merged commit f9d443b into skops-dev:main Jan 23, 2023

BenjaminBossan mentioned this pull request Jan 23, 2023

Add persistence tests using "malicious" skops files #276

Open

adrinjalali deleted the drop-37 branch January 23, 2023 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MNT add python=3.11 support #246

MNT add python=3.11 support #246

adrinjalali commented Dec 9, 2022

adrinjalali commented Dec 9, 2022

jjerphan commented Dec 9, 2022

adrinjalali commented Dec 9, 2022

jjerphan commented Dec 9, 2022

jjerphan commented Dec 9, 2022 •

edited

Loading

BenjaminBossan commented Dec 13, 2022

BenjaminBossan commented Jan 18, 2023

BenjaminBossan commented Jan 20, 2023

adrinjalali commented Jan 20, 2023

BenjaminBossan commented Jan 20, 2023

BenjaminBossan left a comment

MNT add python=3.11 support #246

MNT add python=3.11 support #246

Conversation

adrinjalali commented Dec 9, 2022

adrinjalali commented Dec 9, 2022

jjerphan commented Dec 9, 2022

adrinjalali commented Dec 9, 2022

jjerphan commented Dec 9, 2022

jjerphan commented Dec 9, 2022 • edited Loading

BenjaminBossan commented Dec 13, 2022

BenjaminBossan commented Jan 18, 2023

BenjaminBossan commented Jan 20, 2023

adrinjalali commented Jan 20, 2023

BenjaminBossan commented Jan 20, 2023

BenjaminBossan left a comment

Choose a reason for hiding this comment

jjerphan commented Dec 9, 2022 •

edited

Loading