-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Zarr already exists" Error while uploading #1034
Comments
I can confirm through the API that there is no Zarr in 000108 with a name of "sub-MITU01/ses-20220326h13m51s50/micr/sub-MITU01_ses-20220326h13m51s50_sample-10_stain-YO_run-1_chunk-4_SPIM.ome.zarr". @dandi/dandiarchive Under what other conditions would this error occur? |
@jwodder - there is one:
|
That's very strange. Is there a way it could be "hidden" from view in the regular archive? Looking at sample 10's directory through the ui only yields the .json files for the YO stain.
|
FTR: filed a dedicated issue about suboptimal rendering @slaytonmarx observed - dandi/dandi-archive#1135 |
@satra Oh, I hit the wrong endpoint. However, if I try to look up the Zarr in 000108's list of assets (on the theory that it might have been renamed), nothing comes up: from dandi.dandiapi import AssetType, DandiAPIClient
with DandiAPIClient.for_dandi_instance("dandi") as client:
d = client.get_dandiset("000108")
for asset in d.get_assets():
if asset.asset_type is AssetType.ZARR and asset.zarr == "e64e9cb7-9f2b-4563-b224-267e49a3a90f":
print(asset.path) @dandi/dandiarchive Can someone check the database to see what's going on? EDIT: Perhaps the Zarr was simply never associated with an asset, either due to being uploaded partially by an older version of dandi-cli or due to a bug in satra's s5cmd usage? |
@jwodder - could you prep a little script to sweep through all zarrs in the archive (not -- do not make page_size too big - see dandi/dandi-archive#1137) and list all zarrs for which do not have associated asset in their dandisets (just in most likely we should just have it a part of the In the short term I guess we should just review those specific zarrs we would find not associated and many be "manually" associate them to assets. |
@jwodder actually in that script, could you also please "find" the zarr in the dandiset not via its path but rather by zarr_id. And if zarr is present under a different name -- also alert. I am not sure if we are anyhow working out renaming of zarrs correctly. That duplicity of |
@yarikoptic This script: from dandi.dandiapi import AssetType, DandiAPIClient
with DandiAPIClient.for_dandi_instance("dandi") as client:
dandisets_with_zarrs = set()
zarr_info = {}
for zinfo in client.paginate("/zarr/", page_size=25):
zarr_info[zinfo["zarr_id"]] = zinfo
dandisets_with_zarrs.add(zinfo["dandiset"])
for did in dandisets_with_zarrs:
for asset in client.get_dandiset(did).get_assets():
if asset.asset_type is AssetType.ZARR:
zinfo = zarr_info.pop(asset.zarr)
if zinfo["dandiset"] != did:
print(f"Zarr {asset.zarr}: Zarr is associated with Dandiset {zinfo['dandiset']} but asset belongs to {did}")
elif zinfo["name"] != asset.path:
print(f"Dandiset {did}: Zarr {asset.zarr}: Zarr name is {zinfo['name']!r}, but asset path is {asset.path!r}")
for zinfo in zarr_info.values():
print(f"Dandiset {zinfo['dandiset']}: Zarr {zinfo['zarr_id']} ({zinfo['name']}): Zarr does not have asset") outputted the following:
|
coolio, thank you @jwodder . @satra @slaytonmarx - what is the "provenance" for these or what should we do with them? I see us either
|
it's possible asset creation broke or upload broke at some point. when the cli tries to upload it should allow updating the zarr blob and creating a new asset if necessary. this is exactly what @slaytonmarx is trying to do for some of these chunks. dandi upload <filname.ome.zarr> should check via zarr blob if it's there. if so it should update, and then also create an asset if an asset is missing. i feel there might be some order of operations assumptions that may be made. anything that is not associated with an asset should indeed be garbage collected eventually (but that's true of both regular blobs and zarrs). |
I'm a bit lost, is there anything immediately actionable on the API side? |
IIRC at some point (before #907, released in 0.36.0 this Feb) we could have ended up in such scenario but should not now since asset would be created first for the zarr. Also, may be, we might end up in such scenario if we just remove an asset with zarr, and then zarr would still be lurking behind (like blob does) until GC. For blob it doesn't matter since those are "identified" by their content. For zarr - situation is more ambiguous since (for reason I don't remember) it has that Since I do not have personal preference between the two ways out I suggested, let's proceed with the first which is what @satra also has in mind above. @jwodder could you please adjust code so that if "Zarr already exists" error received , we
Overall policy would be of "reuse of loose zarrs". I guess if in the course of dandi/dandi-archive#1141 we remove/deprecate those |
Reuse "loose" Zarrs that conflict with uploaded path
🚀 Issue was released in |
I received the below error while trying to upload a file using the following command:
DANDI_DEVEL=1 dandi upload --validation skip --allow-any-path /mnt/beegfs/Lee/dandi/sub-MITU01/ses-20220326h13m51s50/micr/sub-MITU01_ses-20220326h13m51s50_sample-10_stain-YO_run-1_chunk-4_SPIM.ome.zarr
Full log attached below.
The file does not already exist within the dandi archive:
My dandi version is 41.0
Full Log:
20220629143352Z-715731.log
The text was updated successfully, but these errors were encountered: