-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Garbage collection #177
Comments
Can you explain why this is a migration blocker? |
just a check here, i thought there is no more uploads - it goes straight into blobs. |
because we do need GC sooner than later, in particular since large datasets are to come soon. It could be "later", but placing it into production where we cannot afford loosing data would be trickier IMHO than implementing it before we "migrate" and while testing the platform, and not carrying much if we loose any data since we would probably still rebootstrap a few times. |
yes. But what happens if an upload is never completed or validated? aren't we ending up with 1. stale |
Design doc is at #560. |
edit by @yarikoptic: PR with design doc incorrectly marked this issue being fixed, it was not |
Now we do have a well aged (10 month) design doc in https://github.com/dandi/dandi-archive/blob/master/doc/design/garbage-collection-1.md . It would be great to re-assess it and implement. In particular in the light of #1450 which might soon produce thousands of loose assets which would get replaced with ones with freshier metadata records. |
This probably depends on #524. |
I am adding to migration target since we better iron it out before switching to dandi-api: bugs in GC can lead to data loss thus IMHO we first need make sure (as with extensive unit-testing and user-testing) that it works reliably before deploying it.
Initial design sketch on garbage collection is present within https://github.com/dandi/dandi-api/pull/150/files#diff-c96d4444d1714a52d5d08dd92d94919393a7db8ded038aa84f02ba1075d2c25eR37 but I think it is worth removing it from that PR and starting a new dedicated one.
I see following targets for GC
Additional aspects:
touch
any blob or asset upon "being queried", since otherwise we might GC a blob in the middle of an asset being "minted" for an existing blob; very less likely but I guess could happen for assets in GC of an asset is triggered while we are "modifying" it and creating a new assettouch
timestamping, this would allow to avoid locking (but would be costly for DB)The text was updated successfully, but these errors were encountered: