-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updating metadata on all draft assets #1450
Comments
Are you expecting some new metadata being extracted from nwb what can't be achieved through metadata record version upgrade? |
yes. our metadata extraction routines have changed several times, so the older assets are definitely out of sync with our current metadata extraction routines. to start with we can limit updates to assets with versions lower than current. |
support for that was added already in dandi-cli . I do not think it is an issue to keep in dandi-archive since I don't think code should be developed here now - just a matter of running that command/script. but I can be wrong ;) |
I believe this issue was filed as a task reminder, not as something to change in the archive codebase. If this can be carried out with dandi-cli, that's fantastic. I think we still need some clarity on exactly what to do, how to do it, and who will do it. Can we get those things decided so we can keep this moving forward? |
we may need to do this via the management console: for each draft dandiset/asset:
since this is a update operation, i don't want the process to conflict with any upload or other actions (e.g. checksum calculation) that may be going on. this may not be something the CLI can do. also the CLI won't have admin privileges unless we use such a token. |
or just finally expose dandiset locking as admin-only operation in API, and adjust already coded in implementation since it seems that we would need similar mechanism nearly for any archive wide operations which might interfere with original owner's actions. |
Running the numbers against the current draft assets in prod (ignoring zarrs), I believe this will involve the processing of ~700GB in order to accomplish. We might be able to bring this down if we can define which assets are considered "old" and which are okay. |
how did you get that number? it shouldn't actually download all the data per asset to do this. the updated code works on cached/remote streaming to do this. |
Summing the size of all assets in draft versions, excluding zarr assets.
So it only needs to access a portion of the stored s3 file? I was using the |
as long as the asset uses the |
i'm starting this issue so that we can do this within the next two weeks.
requires re-extracting metadata from all draft assets and uploading it to the draft version of the dataset. we won't touch published versions here. this is to ensure that all draft datasets are compliant with current schema.
i think this is best done at an admin process level. let's start with all nwb datasets. it would involve running a script on assets from all draft versions to update it.
@jwodder - could you please provide a dandi api-based function to re-extract metadata from a remote nwb file (i.e. a dandi asset url)? this is not getting the asset metadata but to reextract it from the nwb file itself.
cc/ @yarikoptic
The text was updated successfully, but these errors were encountered: