Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Dealing with skops persistence protocol updates #322

Conversation

BenjaminBossan
Copy link
Collaborator

@BenjaminBossan BenjaminBossan commented Mar 16, 2023

This PR is a POC that is intended as a basis for discussing how we can deal with updating the skops persistence protocol.

This is a working implementation that shows one way to deal with updates in the skops persistence protocol in a backwards compatible manner that should, hopefully, scale.

As an example, the update to how functions are persisted is rolled into this PR to show that it works. Therefore, this PR supersedes #320.

Problem description

The problem in general occurs if we make a change to the state that is stored with an object. For instance, we could store an additional field. When we try to load old state without that key, there would be an error. Therefore, we have to somehow adjust our loading code to deal with this possibility.

Rejected solution

We want to avoid having a bunch of conditionals in our loading code that checks what protocol is used and then does this or that thing. This approach would become messy and error prone quite quickly, because we intend to support old protocols for a very long time.

Proposed solution

This proposal here would allow to write a clean implementation of the new deserialization code without regard for backwards compatibility. The old code, however, would not be deleted but instead moved to a different module and registered with its old protocol number. Then, while loading, we would check the protocol stored in the schema and use the old code if there is a match for it. If there is no match, we assume we can safely use the current protocol instead. For this to work, the key used for mapping the state to the loader is changed to be the tuple (loader, protocol), instead of just the loader, as has been the case so far. This is a hopefully transparent way to "dispatch" on both the loader and the protocol.

This is just a draft and I haven't tested it on a real example. If we agree that this is the way forward, I would expand on it and test it properly. But first we should agree that this is the way to go and check if there are no problems with this approach.

Description of the mechanism for changing the protocol

Below are the instructions with how to deal with a change in protocol:

Every time that a backwards incompatible change to the skops format is made for the first time within a release, the protocol should be bumped to the next higher number. The old version of the Node, which knows how to deal with the old state, should be preserved and registered. Let's give an example:

  • There is a BC breaking change in FunctionNode.
  • Since it's the first BC breaking change in the skops format in this release, bump skops.io._protocol.PROTOCOL from version X to Y.
  • Move the old FunctionNode code into skops/io/old/_general_vX.py, where 'X' is the old protocol.
  • Register the _general_vX.FunctionNode in NODE_TYPE_MAPPING inside of _persist.py (on top of registering the current version).
  • Write a test in test_persist_old.py that shows that the old state can still be loaded.

Now, if a user loads a FunctionNode state with version X using skops with version Y, the old code will be used instead of the new one. For all other node types, if there is no loader for version X, skops will automatically use version Y instead.

@skops-dev/maintainers RFC

This PR is a POC that is intended as a basis for discussing how we can
deal with updating the skops persistence protocol.

The problem in general occurs if we make a change to the state that is
stored with an object. For instance, we could store an additional field.
When we try to load old state without that key, there would be an error.
Therefore, we have to somehow adjust our loading code to deal with this
possibility.

First of all, we want to avoid having a bunch of conditionals in our
loading code that checks what protocol is used and then does this or
that thing. This approach would become messy and error prone quite
quickly, because we intend to support old protocols for a very long
time.

This proposal here would allow to write a clean implementation of the
new deserialization code without regard for backwards compatibility. The
old code, however, would not be deleted but instead moved to a different
module and registered with its old protocol number. Then, while loading,
we would check the protocol stored in the schema and use the old code if
there is a match for it. If there is no match, we assume we can safely
use the current protocol instead.

This is just a draft and I haven't tested it on a real example. If we
agree that this is the way forward, I would expand on it and test it
properly. But first we should agree that this is the way to go and check
if there are no problems with this approach.

Below are the instructions with how to deal with a change in protocol:

Every time that a backwards incompatible change to the skops format is made
for the first time within a release, the protocol should be bumped to the next
higher number. The old version of the Node, which knows how to deal with the
old state, should be preserved and registered. Let's give an example:

- There is a BC breaking change in FunctionNode.
- Since it's the first BC breaking change in the skops format in this release,
  bump skops.io._protocol.PROTOCOL (this file) from version X to Y.
- Move the old FunctionNode code into 'skops/io/old/_general_vX.py', where 'X'
  is the old protocol.
- Register the _general_vX.FunctionNode in NODE_TYPE_MAPPING inside of
  _persist.py.

Now, if a user loads a FunctionNode state with version X using skops with
version Y, the old code will be used instead of the new one. For all other
node types, if there is no loader for version X, skops will automatically use
version Y instead.
Not sure why pre-commit didn't catch that...
@BenjaminBossan BenjaminBossan changed the title POC: Proposal for how to deal with protocol update RFC: Proposal for how to deal with skops persistence protocol updates Mar 16, 2023
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than adding some docs in the _general_v0.py and some tests to check for old protocol, I think this looks perfect.

@BenjaminBossan
Copy link
Collaborator Author

Other than adding some docs in the _general_v0.py and some tests to check for old protocol, I think this looks perfect.

One way to go about that would be that I wrap #320 into this PR and have the old + new FunctionNode code as a test case. Otherwise, I would have to come up with a fake example. WDYT?

@adrinjalali
Copy link
Member

adding #320 here sounds good to me.

Taking the changes from skops-dev#320 to how functions are persisted and adding
them here. Therefore, this PR supersedes skops-dev#320.

That change is added here because it is the perfect test case for the
update route of the skops protocol.
@adrinjalali adrinjalali added this to the v0.6 milestone Mar 21, 2023
@BenjaminBossan BenjaminBossan marked this pull request as ready for review March 21, 2023 18:09
@BenjaminBossan
Copy link
Collaborator Author

@skops-dev/maintainers Despite the best efforts of codecov and rtd, I managed to get CI green. Ready for review.

@BenjaminBossan BenjaminBossan changed the title RFC: Proposal for how to deal with skops persistence protocol updates Dealing with skops persistence protocol updates Mar 22, 2023
@BenjaminBossan BenjaminBossan added the persistence Secure persistence feature label Mar 22, 2023
skops/hub_utils/_hf_hub.py Show resolved Hide resolved
skops/io/tests/_utils.py Show resolved Hide resolved
PROTOCOL = 0


class FunctionNode(Node):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should actually this this one, shouldn't we? we're not at the moment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this message is a bit cryptic :P

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, I meant to write, "we should actually test this one, and we're not at the moment", as in, test this particular upgrade for this particular node type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay. I added a test, basically the same as for the old version, minus the downgrading. Strictly speaking, this was already tested indirectly through other tests, which is why there was no test to begin with, but it was easy enough to add.

Also, add "# pragma: no cover" where it makes sense.
Copy link
Collaborator Author

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replied to your comments, please check again.

skops/hub_utils/_hf_hub.py Show resolved Hide resolved
PROTOCOL = 0


class FunctionNode(Node):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this message is a bit cryptic :P

skops/io/tests/_utils.py Show resolved Hide resolved
Test for old and new protocol version are basically identical.
@adrinjalali
Copy link
Member

I'm feeling much better about our persistence now :)

@adrinjalali adrinjalali changed the title Dealing with skops persistence protocol updates ENH Dealing with skops persistence protocol updates Mar 22, 2023
@adrinjalali adrinjalali merged commit c7f8992 into skops-dev:main Mar 22, 2023
@BenjaminBossan BenjaminBossan deleted the POC-updating-skops-persistence-protocol branch March 23, 2023 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
persistence Secure persistence feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants