-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RF/ENH: download - reimplementation to support original and upcoming UI/API #134
Conversation
Codecov Report
@@ Coverage Diff @@
## master #134 +/- ##
==========================================
+ Coverage 78.11% 80.24% +2.13%
==========================================
Files 42 45 +3
Lines 3303 3817 +514
==========================================
+ Hits 2580 3063 +483
- Misses 723 754 +31
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@satra @mgrauer -- "Perfect is the enemy of good". I a bit "overengineered", but it should be a workable solution. Needs testing. For downloads of "released" dandisets dandi/dandi-publish#112 should be addressed first though. |
Thanks for the asciicast, that is some fun viewing 🍿 The UI looks nice, especially the way it updates multiple lines in parallel. I have a couple points of discussion I'd like to raise for the CLI commands, they don't necessarily need to be addressed on this PR, but at least to start thinking about them.
Could we change this to something like the following (I'm omitting output folder to simplify):
At first I was thinking that maybe |
@mgrauer - this is not an acceptable value for the dandi prefix presently in
having a persistent identifier for a draft seems a little weird. so i'm hesitant to add that there, but open to thoughts. instead we could provide the following option through the CLI
and the draft url would always be: |
also, we would never mint a doi for a dandiset which does not have a published version. i can also add a note to the redirector to specifically point to the draft URL for an unpublished dandiset. |
@jwodder I invite you to try this one as well. |
that is already supported out of the box since that is how argparse (which I hope $> dandi download DANDI:000007 --output-dir myother
2020-08-05 16:30:24,676 [ INFO] Traversing remote dandisets (000007) recursively
PATH SIZE DONE DONE% CHECKSUM STATUS MESSAGE
dandiset.yaml done updated
Summary: 0 Bytes 1 done 1 updated
<0.00%
^C
Aborted!
$> ls -l myother
total 4
drwx------ 2 yoh yoh 4096 Aug 5 16:30 000007/ |
I would also prefer here to use what is supported by/through Notes:
So I wondered if it could be Re "interactive" |
I would prefer to avoid "if". That would avoid surprises for people who decide to use dandi datasets in their scripts in CI etc, so they could always know what they are downloading. a. If we decide that it should be most recent published release, then without explicit version specification of "draft" - it should blow. b. If we decide that it should not blow -- it should always redirect to a |
…ANDI archive + publish development
…f tuple elements It became obvious after RFing to use a helper function
Common code from girder.py was moved into download.py etc. Found these changes uncomitted after a long period of time, committing before progressing forward with development
…e going through individual elements Probably overdone/ugly and should be easier/better with async. TODO REDO later
…rs work See referenced in the diff issues about "modified" etc. Also I removed block of code to fetch metadata for an asset if missing - API now returns asset records with metadata
as no concerns were expressed upon requested "manual" testing, and we just keep breeding conflicts, I will resolve conflicts and if all good would merge. |
* origin/master: (24 commits) Remove duplicate "requests" dependency get_instance(): Immediately return known instances that lack redirector URLs Raise helpful error if /server-info returns invalid version Make redirector report http://localhost:8081 as Girder URL Test uploading an unregistered dandiset Assert a certain code path is unreachable Test that lock_dandiset() is called when uploading Tests of lock_dandiset() Remove unused monkeypatches Configure flake8 to run on test code as well Detect nonexistent Dandisets when uploading ENH: upload - add locking of the dandiset Make pytest ignore DeprecationWarnings from responses Fetch all commits in test workflow Replace some occurrences of known_instances with get_instance() Test get_instance() CliVersionTooLowError → CliVersionTooOldError Move CLI version exceptions to dandi/exceptions.py Use semantic_version instead of packaging [GH-184] Add a get_instance() function ... Conflicts: dandi/download.py dandi/girder.py dandi/tests/test_download.py - conflicted with a linting change dandi/tests/test_utils.py - both added tests No conflicts but needed changes: dandi/utils.py -- needed to add dandiapi into returned dandi_instance
…"api" I left module and class still with dandiapi names
dandi/cli/cmd_download.py
Outdated
@@ -48,12 +55,16 @@ | |||
) | |||
@click.argument("url", nargs=-1) | |||
@map_to_click_exceptions | |||
def download(url, output_dir, existing, jobs=6, develop_debug=False): | |||
def download(url, output_dir, existing, jobs=6, format="pytout"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's a situation where click would leave format
unfilled, and if there is, shouldn't the default be "pyout", not "pytout"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh... so we can avoid duplicating default values within kwargs
?! let met just strip then them away from download
and we could do to the rest in a separate PR. THANK YOU! done in 4798e49
dandi/consts.py
Outdated
# Chunk size when iterating a download (and upload) body. Taken from girder-cli | ||
# TODO: should we make them smaller for download than for upload? | ||
# ATM used only in download | ||
MAX_CHUNK_SIZE = os.environ.get("DANDI_MAX_CHUNK_SIZE", 1024 * 1024 * 8) # 64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to cast this to an int
, as the value from os.environ
is a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d'oh, right,THANKS! done in 09b2196
dandi/dandiapi.py
Outdated
lgr = get_logger() | ||
|
||
|
||
class HTTPError(requests.HTTPError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just import HTTPError instead of making a do-nothing subclass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to aggregate non-standard exceptions our code might raise within this module. I guess I could have just gone with from requests import HTTPError
into this namespace indeed but then linters would not be happy (not used etc), so I somewhat like this explicit subclassing. Any disadvantage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no real disadvantage other than a tiny bit of overhead. I'd personally prefer to import HTTPError and add a noqa
comment to tell flake8 to be quiet.
dandi/dandiapi.py
Outdated
""" | ||
self._session = session if session else requests.Session() | ||
|
||
yield self._session |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be:
try:
yield self._session
finally:
self._session.close()
self._session = None
so that the session gets closed on error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed needs a fix, thanks! but it should close only if session was not provided (should not close outside session). Fixed in f367313
…ffect) It did not rebase easily into merge commit so I decided to keep it as is
dandi/dandiapi.py
Outdated
) | ||
|
||
# If success, return the json object. Otherwise throw an exception. | ||
if result.status_code not in (200, 201): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest using "if not result.ok
" instead, as that only regards 4xx and 5xx status codes as errors. There are other 2xx codes that aren't errors, plus the 3xx codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THANKS! done in a45b550
…download Which also removes a typoed default value - duplication is evil!
Thank you @jwodder for the review and suggestions! I have now pushed all the changes I have referenced. |
dandi/dandiapi.py
Outdated
return self.get(f"/dandisets/{dandiset_id}/versions/{version}/") | ||
|
||
def get_dandiset_assets( | ||
self, dandiset_id, version, location=None, page_size=None, include_metadata=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include_metadata
doesn't seem to be used, and there's no TODO item for it. Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it would become "parametrized" for in the query for dandi/dandi-publish#78 resolution, but now it always includes it, so indeed there is no need for it. Removed in 515538e
# dandi.cli.formatters are used in cmd_ls to provide switchable | ||
pyout_style = pyouts.get_style(hide_if_missing=False) | ||
|
||
rec_fields = ("path", "size", "done", "done%", "checksum", "status", "message") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to be passing tuples of columns
to pyout, we should add != 0.6.0
to the dependency in setup.cfg
in order to avoid the version that doesn't support them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# TODO: yield progress etc | ||
msg = {"done": downloaded} | ||
if size: | ||
if downloaded > size and not warned: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warned
never gets set to True
. Should that be added here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, set it to True in 5abd195 ... hopefully we never run into this part of code.
dandi/support/iterators.py
Outdated
yield queue.get(timeout=0.001) | ||
except Empty: | ||
continue | ||
if self.reraise_immediately and self._exc is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to move this check to the top of the while
loop in case get
times out while an exception is raised?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed! done in e52d728
dandi/dandiapi.py
Outdated
lgr = get_logger() | ||
|
||
|
||
class HTTPError(requests.HTTPError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no real disadvantage other than a tiny bit of overhead. I'd personally prefer to import HTTPError and add a noqa
comment to tell flake8 to be quiet.
Originally it was thought that may be resolution for dandi/dandi-publish#78 would make metadata return optional, but it is always returned, so no need for the option
I have argued with having our exception in |
woohoo -- still green, merging before it ripens. |
RFed downloaders. I thought to make it pretty but it ended up quite messy code. Default UI though looks familiar since uses pyout as for
ls
andupload
.Should be well tested before releasing!
tqdm
, Closes tqdm: errors in the tests with "AssertionError: attempt to release recursive lock not owned by thread" #111TODOs
upload
or anything else got broken (we have someupload
unittests going, so let's hope it is all good)TODOs after
>
- I must have broke it).