Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide versioning implementation guidance #348

Closed
rosy1280 opened this issue May 10, 2019 · 8 comments · Fixed by #362
Closed

Provide versioning implementation guidance #348

rosy1280 opened this issue May 10, 2019 · 8 comments · Fixed by #362
Assignees
Milestone

Comments

@rosy1280
Copy link
Contributor

There seems to be confusion in the community about how one would implement versioning (see multiple discussions about pessimistic lock). Can we provide recommendations on how to implement versioning, something like a reference implementation maybe using stanford as an example?

@rosy1280 rosy1280 added this to the 1.0 milestone May 10, 2019
@ahankinson
Copy link
Contributor

Perhaps a change in title for this issue to "Provide versioning implementation guidance" would be clearer? I initially thought this was about spec versioning

@rosy1280 rosy1280 changed the title Provide versioning recommendations Provide versioning implementation guidance May 10, 2019
@julianmorley
Copy link
Contributor

+1. There's still considerable confusion about the line btw an access-based repository for daily work (c.f. Fedora access patterns) and what preserving those assets long-term looks like.

@zimeon
Copy link
Contributor

zimeon commented May 10, 2019

To me the key point is that when to version is a curation choice, and we might provide implementation advice on making that choice. From the spec point of view I think that versions "just are", but there might still be a link off the to guidelines for advice.

@ahankinson
Copy link
Contributor

The sense I get is that it's the nuts-and-bolts of an OCFL client implementation that is the stumbling block: Where to store 'staged' files, how to deal with concurrent writes (use file-based semaphores, or implement gatekeeper software), and the order-of-operations for writing content in a versioning action.

@rosy1280
Copy link
Contributor Author

i think all of the above is true. should we break this up into smaller parts?

@birkland
Copy link
Contributor

To expand on questions related to the nuts and bolts:

  • Using a particular approach for writing a new version on {a direct fs, NFS, S3/cloud}, how does one detect and recover if the client fails at a given step?
  • If two un-coordinated clients happen to accidentally write a new version to an object, what happens? Do any approaches on {a direct fs, NFS, S3/cloud} allow data thought to be safely written, to be silently overwritten or corrupted? Can the spec help with this?
  • Some of the conversation also appeared to cover how an application manages the state of an object before it is committed to OCFL, e.g. in staging, possibly with multiple writers. (this may be out of scope even for implementation notes)

@neilsjefferies
Copy link
Member

neilsjefferies commented May 22, 2019

These are implementation details, not part of the OCFL spec. So by all means we can add some stuff to the implementation notes - it's not like this is rocket science.

  1. If we want to update and a temp version directory exists then abort.
  2. Create temporary new version directory, do all operations there.
  3. Manifest still points to last valid version so all reads work OK.
  4. After all updates done, rename the directory to a version one
  5. Create new root manifest with temp name (copy from version directory)
  6. Finally, lock the object and update the manifest by deleting old and renaming new - this should not take very long!
  7. Update the manifest checksum and unlock
  8. To create the new version you probably read the old version so it should be in your Varnish cache anyway - you do have one for reads don't you?

Clean up after failure

  1. Delete any temp directories - this automatically reverts to last good version
  2. If the manifest checksum fails then you will find a new version dir not in the manifest, goto 5 above
  3. You are rechecksumming everything anyway given what just happened?

...temp directory name.. I dunno... "deposit" is as good as any,
...or "deposit_<transaction etag>" to help clean up any other temp DB's you might have hanging around

@awoods
Copy link
Member

awoods commented May 29, 2019

Thanks, @neilsjefferies . This looks like a good start towards an "Implementation Notes" pull-request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants