Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Zarr already exists" Error while uploading #1034

Closed
slaytonmarx opened this issue Jun 29, 2022 · 13 comments · Fixed by #1035
Closed

"Zarr already exists" Error while uploading #1034

slaytonmarx opened this issue Jun 29, 2022 · 13 comments · Fixed by #1035
Assignees
Labels

Comments

@slaytonmarx
Copy link

slaytonmarx commented Jun 29, 2022

I received the below error while trying to upload a file using the following command:

DANDI_DEVEL=1 dandi upload --validation skip --allow-any-path /mnt/beegfs/Lee/dandi/sub-MITU01/ses-20220326h13m51s50/micr/sub-MITU01_ses-20220326h13m51s50_sample-10_stain-YO_run-1_chunk-4_SPIM.ome.zarr

Full log attached below.

2022-06-29T10:34:15-0400 [ERROR   ] dandi 715731:140277145179904 Error 400 while sending POST request to https://api.dandiarchive.org/api/zarr/: ["Zarr already exists"]
2022-06-29T10:34:15-0400 [ERROR   ] dandi 715731:140277145179904 Error uploading /mnt/beegfs/Lee/dandi/sub-MITU01/ses-20220326h13m51s50/micr/sub-MITU01_ses-20220326h13m51s50_sample-10_stain-YO_run-1_chunk-4_SPIM.ome.zarr:
Traceback (most recent call last):
  File "/mnt/beegfs/satra/miniconda3/envs/dandi/lib/python3.8/site-packages/dandi/upload.py", line 232, in process_path
    for r in dfile.iter_upload(
  File "/mnt/beegfs/satra/miniconda3/envs/dandi/lib/python3.8/site-packages/dandi/files.py", line 838, in iter_upload
    r = client.post(
  File "/mnt/beegfs/satra/miniconda3/envs/dandi/lib/python3.8/site-packages/dandi/dandiapi.py", line 288, in post
    return self.request("POST", path, **kwargs)
  File "/mnt/beegfs/satra/miniconda3/envs/dandi/lib/python3.8/site-packages/dandi/dandiapi.py", line 254, in request
    raise requests.HTTPError(msg, response=result)
requests.exceptions.HTTPError: Error 400 while sending POST request to https://api.dandiarchive.org/api/zarr/
2022-06-29T10:34:16-0400 [DEBUG   ] pyout.interface 715731:140277145179904 Received result for ('size', 'errors', 'upload', 'status', 'message'): {'status': 'ERROR', 'message': 'Error 400 while sending POST request to https://api.dandiarchive.org/api/zarr/'}

The file does not already exist within the dandi archive:
image

My dandi version is 41.0

Full Log:
20220629143352Z-715731.log

@jwodder
Copy link
Member

jwodder commented Jun 29, 2022

I can confirm through the API that there is no Zarr in 000108 with a name of "sub-MITU01/ses-20220326h13m51s50/micr/sub-MITU01_ses-20220326h13m51s50_sample-10_stain-YO_run-1_chunk-4_SPIM.ome.zarr".

@dandi/dandiarchive Under what other conditions would this error occur?

@satra
Copy link
Member

satra commented Jun 29, 2022

@jwodder - there is one:

curl -X GET "https://api.dandiarchive.org/api/zarr/?dandiset=000108&name=sub-MITU01%2Fses-20220326h13m51s50%2Fmicr%2Fsub-MITU01_ses-20220326h13m51s50_sample-10_stain-YO_run-1_chunk-4_SPIM.ome.zarr" -H  "accept: application/json"
{
  "count": 1,
  "next": null,
  "previous": null,
  "results": [
    {
      "name": "sub-MITU01/ses-20220326h13m51s50/micr/sub-MITU01_ses-20220326h13m51s50_sample-10_stain-YO_run-1_chunk-4_SPIM.ome.zarr",
      "dandiset": "000108",
      "zarr_id": "e64e9cb7-9f2b-4563-b224-267e49a3a90f",
      "status": "Complete",
      "checksum": "5a3cdb1c9b65e077011e80bbd1003222-255--602198343",
      "upload_in_progress": true,
      "file_count": 255,
      "size": 602198343
    }
  ]
}

@slaytonmarx
Copy link
Author

That's very strange. Is there a way it could be "hidden" from view in the regular archive? Looking at sample 10's directory through the ui only yields the .json files for the YO stain.

https://dandiarchive.org/dandiset/000108/draft/files?location=sub-MITU01%2Fses-20220326h13m51s50%2Fmicr%2F

@yarikoptic
Copy link
Member

FTR: filed a dedicated issue about suboptimal rendering @slaytonmarx observed - dandi/dandi-archive#1135

@jwodder
Copy link
Member

jwodder commented Jun 29, 2022

@satra Oh, I hit the wrong endpoint. However, if I try to look up the Zarr in 000108's list of assets (on the theory that it might have been renamed), nothing comes up:

from dandi.dandiapi import AssetType, DandiAPIClient

with DandiAPIClient.for_dandi_instance("dandi") as client:
    d = client.get_dandiset("000108")
    for asset in d.get_assets():
        if asset.asset_type is AssetType.ZARR and asset.zarr == "e64e9cb7-9f2b-4563-b224-267e49a3a90f":
            print(asset.path)

@dandi/dandiarchive Can someone check the database to see what's going on?

EDIT: Perhaps the Zarr was simply never associated with an asset, either due to being uploaded partially by an older version of dandi-cli or due to a bug in satra's s5cmd usage?

@yarikoptic
Copy link
Member

@jwodder - could you prep a little script to sweep through all zarrs in the archive (not -- do not make page_size too big - see dandi/dandi-archive#1137) and list all zarrs for which do not have associated asset in their dandisets (just in draft since we AFAIK do not yet support publishing versioned dandisets with zarrs)?

most likely we should just have it a part of the gc to prune zarr archives without associated assets because I would guess that is what could happen if a zarr asset is removed for the archive. I will add a note to dandi/dandi-archive#177

In the short term I guess we should just review those specific zarrs we would find not associated and many be "manually" associate them to assets.

@yarikoptic
Copy link
Member

@jwodder actually in that script, could you also please "find" the zarr in the dandiset not via its path but rather by zarr_id. And if zarr is present under a different name -- also alert. I am not sure if we are anyhow working out renaming of zarrs correctly. That duplicity of name in zarr record and asset might be also biting us.

@jwodder
Copy link
Member

jwodder commented Jun 29, 2022

@yarikoptic This script:

from dandi.dandiapi import AssetType, DandiAPIClient

with DandiAPIClient.for_dandi_instance("dandi") as client:
    dandisets_with_zarrs = set()
    zarr_info = {}
    for zinfo in client.paginate("/zarr/", page_size=25):
        zarr_info[zinfo["zarr_id"]] = zinfo
        dandisets_with_zarrs.add(zinfo["dandiset"])
    for did in dandisets_with_zarrs:
        for asset in client.get_dandiset(did).get_assets():
            if asset.asset_type is AssetType.ZARR:
                zinfo = zarr_info.pop(asset.zarr)
                if zinfo["dandiset"] != did:
                    print(f"Zarr {asset.zarr}: Zarr is associated with Dandiset {zinfo['dandiset']} but asset belongs to {did}")
                elif zinfo["name"] != asset.path:
                    print(f"Dandiset {did}: Zarr {asset.zarr}: Zarr name is {zinfo['name']!r}, but asset path is {asset.path!r}")
    for zinfo in zarr_info.values():
        print(f"Dandiset {zinfo['dandiset']}: Zarr {zinfo['zarr_id']} ({zinfo['name']}): Zarr does not have asset")

outputted the following:

Dandiset 000108: Zarr 66b0a418-ef47-4542-bdfb-ff3bf5cc8aba (sub-MITU01/ses-20220326h13m51s50/micr/sub-MITU01_ses-20220326h13m51s50_sample-10_stain-YO_run-1_chunk-5_SPIM.ome.zarr): Zarr does not have asset
Dandiset 000108: Zarr 5f1fdae7-6ebc-4f36-a98e-9760078a77c2 (sub-MITU01/ses-20220326h13m51s50/micr/sub-MITU01_ses-20220326h13m51s50_sample-10_stain-YO_run-1_chunk-2_SPIM.ome.zarr): Zarr does not have asset
Dandiset 000108: Zarr 9a06b493-bffd-4ebe-9851-5b7ce12f90c6 (sub-MITU01/ses-20220326h13m51s50/micr/sub-MITU01_ses-20220326h13m51s50_sample-10_stain-YO_run-1_chunk-3_SPIM.ome.zarr): Zarr does not have asset
Dandiset 000108: Zarr e64e9cb7-9f2b-4563-b224-267e49a3a90f (sub-MITU01/ses-20220326h13m51s50/micr/sub-MITU01_ses-20220326h13m51s50_sample-10_stain-YO_run-1_chunk-4_SPIM.ome.zarr): Zarr does not have asset
Dandiset 000108: Zarr e1641fdc-142e-443c-aec0-66cf2e289c12 (sub-MITU01/ses-20220326h13m51s50/micr/sub-MITU01_ses-20220326h13m51s50_sample-10_stain-YO_run-1_chunk-1_SPIM.ome.zarr): Zarr does not have asset

@yarikoptic
Copy link
Member

coolio, thank you @jwodder . @satra @slaytonmarx - what is the "provenance" for these or what should we do with them? I see us either

  • associating them with asset, and then I believe it should just work ok to "update" them
  • removing them and starting a fresh

@satra
Copy link
Member

satra commented Jun 29, 2022

it's possible asset creation broke or upload broke at some point. when the cli tries to upload it should allow updating the zarr blob and creating a new asset if necessary. this is exactly what @slaytonmarx is trying to do for some of these chunks.

dandi upload <filname.ome.zarr>

should check via zarr blob if it's there. if so it should update, and then also create an asset if an asset is missing.

i feel there might be some order of operations assumptions that may be made. anything that is not associated with an asset should indeed be garbage collected eventually (but that's true of both regular blobs and zarrs).

@jjnesbitt
Copy link
Member

I'm a bit lost, is there anything immediately actionable on the API side?

@yarikoptic
Copy link
Member

it's possible asset creation broke or upload broke at some point.

IIRC at some point (before #907, released in 0.36.0 this Feb) we could have ended up in such scenario but should not now since asset would be created first for the zarr. Also, may be, we might end up in such scenario if we just remove an asset with zarr, and then zarr would still be lurking behind (like blob does) until GC. For blob it doesn't matter since those are "identified" by their content. For zarr - situation is more ambiguous since (for reason I don't remember) it has that name field which really stores path. @AlmightyYakob - can you grasp from design docs/code (now that @dchiquito is gone) why we need name and may be could we get rid of it? then association of Asset -> zarr would become solely at the level of an asset. I filed a dedicated dandi/dandi-archive#1141 for that . I do not think it is urgent but we better clarify situation/design on this and remove possibility of ending up in similar "tricky" situations.

Since I do not have personal preference between the two ways out I suggested, let's proceed with the first which is what @satra also has in mind above. @jwodder could you please adjust code so that if "Zarr already exists" error received , we

  • log a warning that we found a "loose zarr" which was already associated with that path, and that it will be reused for a new asset
  • "find" zarr_id for that zarr, and associate it with that asset which we have created (I assume) at the beginning of upload

Overall policy would be of "reuse of loose zarrs". I guess if in the course of dandi/dandi-archive#1141 we remove/deprecate those name and dandiset associations, such reusal would no longer be possible.

yarikoptic added a commit that referenced this issue Jul 1, 2022
Reuse "loose" Zarrs that conflict with uploaded path
@github-actions
Copy link

github-actions bot commented Jul 1, 2022

🚀 Issue was released in 0.42.0 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants