Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalidate existing dataset or metadata accessor objects after a checkout is closed #41

Merged
merged 7 commits into from
May 17, 2019

Conversation

rlizzo
Copy link
Member

@rlizzo rlizzo commented May 7, 2019

These changes ensure that when a checkout (reader or writer) is closed, that any references to dataset or metadata objects are invalidated and will not perform any type of operation.

Description

Instead of explicitly validating that some sort of IS_ALIVE sentinal on every call to the dataset or metadata interactor objects, we use an invalidation scheme based on weakrefs. The control flow is as follows:

  1. A strong reference to a dataset or metadata instance is created and held in a "_private" attribute of the checkout class. In the new scheme, this is the ONLY strong reference which will be kept to the members.

  2. For public consumption, datasets and metadata methods are defined in the checkout object using the property protocol to remap the getter to the internal attributes. Because no setter is defined, these properties cannot be changed by the user.

  3. At this point, we do a "bait-and-switch", providing a weakref proxy to the requested object. By definition, these do not increment the reference count, but pass through all calls to the checkout's instance of the actual object.

  4. when the checkout.close() method is called, the checkout objects ._datasets and ._metadata attributes are deleted and we explicitly perform a garbage collection.

  5. Since weakrefs don't increment reference counts, when the attributes are deleted on close and gc is run, the proxy mapping no longer has a valid reference. It will raise a ReferenceError() if any operation is attempted on it.

  6. Validations performed in the checkout class ensure that the object is essentially dead to the user, and a new checkout must be created to interact with the repository again.

This is a performant and clean way of ensuring that users are not able to perform operations during a time they shouldn't be able to. Methods are implemented for both read and write checkouts and both function as expected and pass all tests.

Motivation and Context

Examples of the problem before this fix were first shown in Issue #36. Please refer to that posting for further background.

How Has This Been Tested?

Test cases have been added to the test suite.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Please review @hhsecond and @lantiga

@rlizzo rlizzo added the Bug: Priority 1 ANY chance of data/record corruption or loss. label May 7, 2019
@rlizzo rlizzo requested review from lantiga and hhsecond May 7, 2019 22:52
@rlizzo rlizzo added this to the v0.1.0 Release milestone May 7, 2019
Copy link
Member

@hhsecond hhsecond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have found one simple bug, apart from that, PR looks good to me and the weakref idea was brilliant

src/hangar/dataset.py Outdated Show resolved Hide resolved
tests/test_checkout.py Show resolved Hide resolved
src/hangar/checkout.py Outdated Show resolved Hide resolved
@rlizzo
Copy link
Member Author

rlizzo commented May 17, 2019

All review comments have been addressed. merging this now.

@rlizzo rlizzo merged commit 2f90249 into tensorwerk:master May 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug: Priority 1 ANY chance of data/record corruption or loss. Resolved
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants