Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

download: support for downloading multiple URLs at once; dandiset path is now included in the report #1231

Merged
merged 3 commits into from
Mar 1, 2023

Conversation

jwodder
Copy link
Member

@jwodder jwodder commented Feb 28, 2023

Closes #1217.

Note that I'm not sure about whether the aggregating iterator works correctly, as when I tried downloading multiple URLs at once, the aggregate stats were only shown once the program finished. At least one problem is that it.finished will be briefly true between URLs.

@jwodder jwodder added minor Increment the minor version when merged cmd-download labels Feb 28, 2023
@codecov
Copy link

codecov bot commented Feb 28, 2023

Codecov Report

Patch coverage: 93.51% and project coverage change: +0.06 🎉

Comparison is base (a5a98fa) 89.14% compared to head (1b2ad6c) 89.21%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1231      +/-   ##
==========================================
+ Coverage   89.14%   89.21%   +0.06%     
==========================================
  Files          74       74              
  Lines        9446     9597     +151     
==========================================
+ Hits         8421     8562     +141     
- Misses       1025     1035      +10     
Flag Coverage Δ
unittests 89.21% <93.51%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
dandi/cli/cmd_download.py 88.46% <ø> (ø)
dandi/download.py 88.19% <92.55%> (+0.46%) ⬆️
dandi/support/iterators.py 100.00% <100.00%> (ø)
dandi/tests/test_download.py 100.00% <100.00%> (ø)
dandi/due.py 48.00% <0.00%> (-20.00%) ⬇️
dandi/tests/test_validate.py 82.97% <0.00%> (-17.03%) ⬇️
dandi/dandiapi.py 86.75% <0.00%> (-0.78%) ⬇️
dandi/files/bases.py 78.67% <0.00%> (-0.41%) ⬇️
dandi/validate.py 97.61% <0.00%> (-0.26%) ⬇️
dandi/tests/fixtures.py 97.34% <0.00%> (-0.26%) ⬇️
... and 10 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

)
try:
digests["sha256"] = d["dandi:sha2-256"]
except KeyError:

Check notice

Code scanning / CodeQL

Empty except

'except' clause does nothing but pass and there is no explanatory comment.
@yarikoptic
Copy link
Member

Note that I'm not sure about whether the aggregating iterator works correctly, as when I tried downloading multiple URLs at once, the aggregate stats were only shown once the program finished. At least one problem is that it.finished will be briefly true between URLs.

yeah, I am not sure yet either, might have a look tomorrow with a fresh'ier eye. But we have another conundrum to resolve I ran into right away. When pointed to multiple datasets, we have path within dandiset, and not showing to which dandiset it belongs. As a result we do not only somewhat incompletely inform user, but since path is the "identifier" for pyout, files with identical path (e.g. dandiset.yaml) overwrite within in the same row:

❯ dandi download DANDI:00002{7,9}
PATH                                             SIZE     DONE            DONE% CHECKSUM STATUS          MESSAGE   
dandiset.yaml                                                                            done            updated   
sub-RAT123/sub-RAT123.nwb                        18.8 kB  18.8 kB          100%    ok    done                      
sub-anm369964/sub-anm369964_behavior+ecephys.nwb 7.7 MB   7.7 MB           100%    ok    done                      
sub-anm369962/sub-anm369962_behavior+ecephys.nwb 6.6 MB   6.6 MB           100%    ok    done                      
sub-anm369963/sub-anm369963_behavior+ecephys.nwb 6.4 MB   6.4 MB           100%    ok    done                      
Summary:                                         20.7 MB  20.7 MB                        5 done          1 updated 
                                                 +18.8 kB 99.91%                                                   
                                                          ETA: 6 seconds                                           

I guess we are doomed to uniformize (regardless if multiple URLs are given or not) this to the paths as they are to appear on local drive, hence in this case with dandiset prefixed, i.e. smth like

❯ dandi download DANDI:00002{7,9}
PATH                                             SIZE     DONE            DONE% CHECKSUM STATUS          MESSAGE   
000027/dandiset.yaml                                                                            done            updated   
000027/sub-RAT123/sub-RAT123.nwb                        18.8 kB  18.8 kB          100%    ok    done                      
000029/dandiset.yaml                                                                            done            updated   
000029/sub-anm369964/sub-anm369964_behavior+ecephys.nwb 7.7 MB   7.7 MB           100%    ok    done                      
000029/sub-anm369962/sub-anm369962_behavior+ecephys.nwb 6.6 MB   6.6 MB           100%    ok    done                      
000029/sub-anm369963/sub-anm369963_behavior+ecephys.nwb 6.4 MB   6.4 MB           100%    ok    done          

@yarikoptic
Copy link
Member

BTW, while at it, if you feel you see how to "just" RF it to parallelize within out code instead of max_workers=jobs for pyout and rely on it being fed generators, and rather just parallelize, and feed pyout "serially" from the parallel threads - might simplify this beast and make it parallelizable not only via pyout.

@yarikoptic yarikoptic changed the title Support for downloading multiple URLs at once download: support for downloading multiple URLs at once; dandiset path is now included in the report Mar 1, 2023
@yarikoptic yarikoptic merged commit 2e117df into master Mar 1, 2023
@yarikoptic yarikoptic deleted the gh-1217 branch March 1, 2023 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmd-download minor Increment the minor version when merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

enable downloading in parallel from asset URL list
2 participants