Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC Add secure persistence docs #145

Merged
merged 20 commits into from
Sep 29, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,10 @@ download or load the model.
and how it should be used. The model card can then be stored as the
``README.md`` file on the Hugging Face Hub, with pre-populated metadata to
help Hub understand the model.
- ``skops.io``: Secure persistence of sklearn estimators and more without using
BenjaminBossan marked this conversation as resolved.
Show resolved Hide resolved
``pickle``. Visit `the docs
<https://skops.readthedocs.io/en/latest/persistence.html>`_ for more
information.

Please refer to our `documentation <https://skops.readthedocs.io/en/latest/>`_
on using the library as user, which includes user guides on the above topics as
Expand Down
4 changes: 4 additions & 0 deletions docs/changes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ v0.3
- Add ``private`` as an optional argument to :meth:`.hub_utils.push` to
optionally set the visibility status of a repo when pushing to the hub.
:pr:`130` by `Adrin Jalali`_.
- First release of the skops secure persistence feature (:pr:`128`) by `Adrin
Jalali`_ and `Benjamin Bossan`_. Visit :ref:`persistence` for more
information. This feature is not production ready yet but we're happy to
receive feedback from users.

v0.2
----
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ User Guide / API Reference
installation
hf_hub
model_card
persistence
modules/classes
changes

Expand Down
6 changes: 5 additions & 1 deletion docs/modules/classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,15 @@ This is the class and function reference of skops.

:mod:`skops.hf_hub`: Hugging Face Hub Integration
=================================================

.. automodule:: skops.hub_utils
:members:

:mod:`skops.card`: Model Card Utilities
=======================================
.. automodule:: skops.card
:members:

:mod:`skops.io`: Secure persistence
===================================
.. automodule:: skops.io
:members:
80 changes: 80 additions & 0 deletions docs/persistence.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
.. _persistence:

Secure persistence with skops
=============================

.. warning::

This feature is very early in development, which means that the API is
unstable and that it is **not secure** for the moment. Therefore, the same
caution as for pickle should be applied: Don't load from sources that you
don't trust. In the future, more security will be added.
BenjaminBossan marked this conversation as resolved.
Show resolved Hide resolved

Skops offers a way to save and load sklearn models without using :mod:`pickle`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this opening would be much stronger if you presented it as an issue with pickle + how skops addresses it. It may not be obvious to a user why they shouldn't use pickle especially if it is so common. Something like:

The pickle module is not secure, but with skops, you can securely save and load sklearn models without using pickle.

Pickle is the standard serialization format for sklearn and for Python in
general. It has the big advantage that it can be used for almost all Python code
BenjaminBossan marked this conversation as resolved.
Show resolved Hide resolved
but this flexibility also means that it's inherently insecure. As the Python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be a little clearer if you also explain why its flexibility means it's insecure. Maybe something like:

but this flexibility also means that it's inherently insecure because [insert reason here].

docs say:

The pickle module is not secure. Only unpickle data you trust. It is
possible to construct malicious pickle data which will execute arbitrary
code during unpickling. Never unpickle data that could have come from an
untrusted source, or that could have been tampered with.
adrinjalali marked this conversation as resolved.
Show resolved Hide resolved

With the :func:`skops.io.save` and :func:`skops.io.load` functions, we aim at
providing a secure method to load and save sklearn code. In contrast to pickle,
these functions cannot be used to save arbitrary Python code, but they bypass
pickle and are thus more secure.
BenjaminBossan marked this conversation as resolved.
Show resolved Hide resolved

Usage
-----

Using :func:`skops.io.save` and :func:`skops.io.load` is quite simple. Below is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd personally avoid using a "value" word like simple or easy since it is relative to a user's experience and can make the tone not as friendly for beginners. I think the code snippet below already does a great job of showing how simple it is without having us tell the user it is simple :)

an example:

.. code:: python

from sklearn.linear_model import LogisticRegression
from skops.io import load, save

clf = LogisticRegression(random_state=0, solver="liblinear")
clf.fit(X_train, y_train)
save(clf, "my-logistic-regression.skops")
# ...
loaded = load("my-logistic-regression.skops")
loaded.predict(X_test)

At the moment, we support the vast majority of sklearn estimators. This includes
complex use cases such as :class:`sklearn.pipeline.Pipeline`,
:class:`sklearn.model_selection.GridSearchCV`, classes using Cython code, such
as :class:`sklearn.tree.DecisionTreeClassifier`, etc. If you discover an sklearn
BenjaminBossan marked this conversation as resolved.
Show resolved Hide resolved
estimator that does not work, please open an issue on the skops `GitHub page
<https://github.com/skops-dev/skops/issues>`_ and let us know.

In contrast to pickle, skops cannot persist arbitrary Python code. In
particular, this means that if you have custom functions (say, a custom function
to be used with :class:`sklearn.preprocessing.FunctionTransformer`), it will not
work. However, most ``numpy`` and ``scipy`` functions should work. Therefore,
you can actually save ``numpy.sqrt`` etc.
BenjaminBossan marked this conversation as resolved.
Show resolved Hide resolved

Goals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'd maybe rename this section to just Roadmap and combine it with the section below.

-----

Currently, it is still possible to run insecure code when using skops
persistence. E.g., it's possible to load a save file that evaluates arbitrary
BenjaminBossan marked this conversation as resolved.
Show resolved Hide resolved
code using :func:`eval`. We do, however, have concrete plans on how to mitigate
this. Please stay updated.
BenjaminBossan marked this conversation as resolved.
Show resolved Hide resolved

On top of trying to support all of sklearn, we plan on making persistence
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
On top of trying to support all of sklearn, we plan on making persistence
On top of trying to support persistence in sklearn, we plan on making persistence

extensible for other libraries. As a user, this means that if you trust a
certain library, you will be able to tell skops to load code from that library.
As a library author, there will be a clear path of what needs to be done to add
secure persistence to your library, such that skops can save and load code from
your library.

Roadmap
-------

To follow what features are currently planned, filter for the `"persistence"
label <https://github.com/skops-dev/skops/labels/persistence>`_ in our GitHub
issues.
47 changes: 47 additions & 0 deletions skops/io/_persist.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,29 @@


def save(obj, file):
"""Save an object using the skops persistence format.

Skops aims at providing a secure persistence feature that does not rely on
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to put warnings only to load rather than save, but feel free to ignore if you think it doesn't make sense😅

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I went overboard with warnings, but I really wanted to make sure users don't miss the fact that we're still early in development and know that it's not secure yet. Hopefully, we can fill these gaps soon and then remove the warnings. Until then, I'd rather leave them in.

:mod:`pickle`, which is inherently insecure. For more information, please
visit the :ref:`persistence` documentation.

.. warning::

This feature is very early in development, which means that the API is
unstable and that it is **not secure** for the moment. Therefore, the
same caution as for pickle should be applied: Don't load from sources
that you don't trust. In the future, more security will be added.

Parameters
----------
obj
BenjaminBossan marked this conversation as resolved.
Show resolved Hide resolved
The object to be saved
BenjaminBossan marked this conversation as resolved.
Show resolved Hide resolved

file: str
The file name. A zip archive will automatically created, with the ".zip"
ending added to the given filename.
adrinjalali marked this conversation as resolved.
Show resolved Hide resolved

"""
with tempfile.TemporaryDirectory() as dst:
with open(Path(dst) / "schema.json", "w") as f:
state = get_state(obj, dst)
Expand All @@ -43,6 +66,30 @@ def save(obj, file):


def load(file):
"""Load an object saved with the skops persistence format.

Skops aims at providing a secure persistence feature that does not rely on
:mod:`pickle`, which is inherently insecure. For more information, please
visit the :ref:`persistence` documentation.

.. warning::

This feature is very early in development, which means that the API is
unstable and that it is **not secure** for the moment. Therefore, the
same caution as for pickle should be applied: Don't load from sources
that you don't trust. In the future, more security will be added.

Parameters
----------
file: str
The file name of the object to be loaded.

Returns
-------
instance
BenjaminBossan marked this conversation as resolved.
Show resolved Hide resolved
The loaded object.

"""
with ZipFile(file, "r") as input_zip:
schema = input_zip.read("schema.json")
instance = get_instance(json.loads(schema), input_zip)
Expand Down