Add "resume" functionality to download #189

jwodder · 2020-08-12T15:43:54Z

~~Note that this PR merges into the #134 branch, not into master.~~ (no longer needed)

Closes #182

codecov · 2020-08-12T15:44:07Z

Codecov Report

Merging #189 into master will decrease coverage by 10.85%.
The diff coverage is 57.89%.

@@             Coverage Diff             @@
##           master     #189       +/-   ##
===========================================
- Coverage   81.77%   70.91%   -10.86%     
===========================================
  Files          50       50               
  Lines        4186     4264       +78     
===========================================
- Hits         3423     3024      -399     
- Misses        763     1240      +477

Flag	Coverage Δ
#unittests	`70.91% <57.89%> (-10.86%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
dandi/download.py	`80.54% <53.12%> (-2.80%)`	⬇️
dandi/tests/fixtures.py	`68.18% <58.53%> (-28.31%)`	⬇️
dandi/girder.py	`65.68% <61.11%> (-16.33%)`	⬇️
dandi/dandiapi.py	`82.17% <75.00%> (-0.48%)`	⬇️
dandi/tests/test_girder.py	`20.00% <0.00%> (-80.00%)`	⬇️
dandi/tests/test_upload.py	`21.12% <0.00%> (-78.88%)`	⬇️
dandi/tests/test_register.py	`28.57% <0.00%> (-71.43%)`	⬇️
dandi/upload.py	`3.23% <0.00%> (-70.86%)`	⬇️
dandi/support/generatorify.py	`0.00% <0.00%> (-64.37%)`	⬇️
dandi/register.py	`33.33% <0.00%> (-37.04%)`	⬇️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d091704...122b6dc. Read the comment docs.

yarikoptic

Looks awesome!!! I left some "concerns" (which might not be valid) and minor recommendations.

Ideally we should also somehow test on urls from "publish" API with interruption (also somewhere in the middle of the file) so we know that restart from the middle works correctly.

We better not commit such big "test" files into git directly here. Among fixtures we already have some "fake data files" fixtures such as simple2_nwb. We just need to add another one which would fill up some data structure within it making it large. Then another session wide fixture could upload it to our testing deployment so it could be reused in various tests. I wonder if may be that fixture even could use (and thus test) publish API -- so we "upload" to drafts, "publish" a version of that dataset and thus get URLs to both drafts and published version... it might though that "redirector" two girder URLs PR being merged to get that one running.

dandi/download.py

yarikoptic · 2020-08-12T16:20:40Z

dandi/download.py

+                        if downloaded:
+                            blob1 = writer.read(len(block))
+                            if blob1 == block:
+                                downloaded += len(block)


since succesful, and we went back two chunks, either here we would need to refeed a new digester with previously downloaded data, or may be just avoid updating digester below for redownloaded block? without it -- it would end up with an incorrect digest I think, or am I wrong?

Commit ba8767f should address this.

thanks ... Going through the code I think that should work indeed! But logic is a bit tricky (at least to me). So I think we just need a good number of tests on some file longer than 3 chunks, and have 3 runs with download interruption on 1st chunk, then on 2nd, then on 3rd. And then some test with fails on multiple chunks (so may be if file was 5 chunks, fail on 3 rd and 5 th). Just to see that all aspects of restart work out nicely.

Do you have any suggestions on how to simulate download interruptions? The best thing I've been able to find for this is https://github.com/Shopify/toxiproxy, but it doesn't handle TLS certificates intelligently, and either way I've been unable to actually connect to its proxy in a minimal setup.

Isn't our docker instance run over a straight http without any encryption? I would have just tried to mock shim the requests streaming "thing" to abort with some code reaching the desired chunk

Hm... What then happens if eg connection is lost? If some other exception - let's raise that one?

As I said above, when I tried inducing a lost connection with a test script (by turning off my WiFi), there was no error. However, an SO post from six years ago says that such an event should raise a socket.timeout.

Even if we do determine the proper error to raise, there's still the matter of what to mock and how so that the error gets raised at the right point.

hm... I have also failed so far (by disabling/enabling WiFi) to trigger requests to give up and raise some exception (but I didn't wait for too long I guess). I have not realized that requests is really trying hard to maintain that stream. But I doubt it would try going back and resume if connection does get interrupted somehow (e.g. jumping from one network to another?).

As for what to mock, I would have mocked get_download_file_iter which we have in both girder and dandiapi. Mock shim them would obtain original generator function, but instead of returning that one, would return one which would be a generator which would yield from the original one until it hits the chunk it wants to raise on (implementation of get_download_file_iter in dandicli is easier to grasp - there is an explicit loop there). I would for now raise socket.timeout SO talks about.

Related to all this is #198 where also "resume" would be used.

I still can't figure out what URL to pass to download(). The mock no longer interferes with http://localhost:8079/dandiset/{dandi_id}/draft, but now the request to that URL is returning 500, presumably because of the GIRDER_URL issue discovered in #186 that we didn't completely address because editing serve.py wasn't necessary at the time.

In principle you could test directly _download_file but I guess it would need too much of preparation, but then you could pass shimmed downloader without mock ...

And indeed, the dandi/redirector#25 was never merged so redirector within docker compose can't talk to girder so such queries to local redirector wouldnt work

But if it is a released version , then URLs like https://github.com/dandi/dandi-cli/blob/master/dandi/download.py#L63 should work (just point to localhost ) although not sure if mapping to API server via server-info would work out.

I will try also to dig deeper later when near laptop

dandi/girder.py

yarikoptic · 2020-08-12T16:23:27Z

dandi/tests/test_download.py

+                    raise gcl.HttpError(
+                        text="Failing to download", url=url, method="GET", status=500
+                    )
+            return orig_sendRestRequest(self, *args, **kwargs)


if any of our test files big enough so we could trigger redownload when somewhere past 2*chunk sizes?

No. Both of the files in dandi/tests/data/dandifiles are under a megabyte.

Let's make a fixture producing sizeable file for local upload/download testing as I have outlined in the review summary?

jwodder · 2020-08-12T20:33:27Z

@yarikoptic Is there any particular attribute/structure of an NWB file that should be "filled up"? Is there a way to just insert random bytes?

yarikoptic · 2020-08-12T20:48:51Z

At first I thought to ask nwb experts on what would be the best .nwb file for testing we could produce... but since it is a file we are to generate dynamically anyways and we just need "size", let's just follow one of the tutorials at https://pynwb.readthedocs.io/en/latest/tutorials/index.html and indeed just fill up with some random junk (or 0s, which would be ok but might be suboptimal for testing download retries ;)).

So, for now, just follow e.g. this tutorial on extracellular recordings (which would be useful also to get a better understanding of structures within .nwb which would be useful for later work with metadata etc): https://pynwb.readthedocs.io/en/latest/tutorials/domain/ecephys.html#sphx-glr-tutorials-domain-ecephys-py which already has code to populate that "demo" NWB with random data.

yarikoptic

Some fresh(er) feedback

yarikoptic · 2020-08-16T03:51:17Z

dandi/download.py

+                        if downloaded:
+                            blob1 = writer.read(len(block))
+                            if blob1 == block:
+                                downloaded += len(block)


thanks ... Going through the code I think that should work indeed! But logic is a bit tricky (at least to me). So I think we just need a good number of tests on some file longer than 3 chunks, and have 3 runs with download interruption on 1st chunk, then on 2nd, then on 3rd. And then some test with fails on multiple chunks (so may be if file was 5 chunks, fail on 3 rd and 5 th). Just to see that all aspects of restart work out nicely.

dandi/download.py

jwodder · 2020-08-25T16:17:49Z

@yarikoptic Using the new redirector image after merging dandi/redirector#25, the URL http://localhost:8079/dandiset/{dandi_id}/draft successfully redirects to http://localhost:8086/#/dandiset/000004/draft, but then that causes the tests to fail on this line with 'NoneType' object has no attribute 'endswith'. This appears to be due to the local-docker-tests instance in known_instances having a None value for its api attribute.

yarikoptic · 2020-08-25T17:17:59Z

but then that causes the tests to fail on this line with 'NoneType' object has no attribute 'endswith'

I think it should be safe to adjust it to if server and not server.endswith("/"):

jwodder · 2020-08-25T17:41:12Z

@yarikoptic Now the tests fail on this line with 'NoneType' object has no attribute 'rstrip'.

yarikoptic · 2020-08-25T18:13:41Z

I do not see any "proper" handling there, but would do smth like

    # TODO: workaround - if no server_url, assume "local-docker-tests"
    server_url = known_instances[known_instances_rev[server_url.rstrip("/")] if server_url else "local-docker-tests"].girder

jwodder · 2020-08-25T20:55:29Z

Current status: test_resume_download is hanging for upwards of ten minutes. Inserting logging & print statements shows that the hang occurs when the original GirderCli.get_download_file_iter(file_id, chunk_size) is called for the second time (specifically, at the call to self.sendRestRequest() inside the method). I am unable to reproduce this in a minimal test script.

yarikoptic · 2020-08-25T22:44:05Z

Great! I will try to dig into it as well .

yarikoptic · 2020-08-26T01:32:28Z

BTW, note that although we found that we could pass Range within the header to girder, it is not necessarily it can support those internally ... in my case the testing eventually (indeed was stuck somewhere, didn't dig yet) fails with

dandi/tests/test_download.py:269: in test_resume_download
    download(large_dandiset, tmp_path)
dandi/download.py:308: in download
    for rec in gen_:
dandi/download.py:452: in download_generator
    for resp in _download_generator:
dandi/download.py:725: in _download_file
    for block in downloader(start_at=downloaded):
dandi/tests/test_download.py:258: in new_downloader
    assert start_at == expected_start
E   assert 0 == 25165824
E     -0
E     +25165824

may be we are doomed to implement this feature only for "dandiapi" portion (not worth spending time ATM on girder IMHO) and thus wait for being able to talk to django?

jwodder · 2020-08-26T12:39:49Z

@yarikoptic Actually, when I was trying to create an MVCE for the hanging request, I found that Girder didn't seem to honor the "Range" header (despite Girder's response headers indicating that it should), but it did honor an "offset" query parameter mentioned in the interactive documentation, so I changed the GirderCli method to use that in eb7e0d5. The change didn't stop the hanging, though.

yarikoptic · 2020-09-08T15:36:22Z

I have just merged the #232 so I guess we could progress on this one with a "large file upload/release" fixture and implementing integration testing.

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

jwodder · 2020-09-08T16:10:20Z

@yarikoptic How exactly do I do a "release" or otherwise take advantage of the "publish" container? There's no command for that in dandi-cli at the moment.

yarikoptic · 2020-09-08T16:17:03Z

Just go to /swagger interface of the running instance, there should be /publish end point - something like /dandisets/{dandiset_id}/versions/{version}/publish (or the one with {version} in the options to the call, don't remember). Then add interface for it to dandiapi.DandiAPIClient class.

jwodder · 2020-09-08T16:30:45Z

Assuming that by /swagger you mean the /api/v1 page of Girder, I see no such endpoint, not even when logged in as admin to an instance connected to dandi-publish using the latest versions of the Docker images.

EDIT: Found it on the dandi-publish container.

jwodder · 2020-09-08T16:43:47Z

Questions about the endpoint:

Is dandiset__pk supposed to be the dandiset identifier, a Girder ID, or something else?
What exactly is the "name" parameter in the request body for?
What exactly is the "dandiset" parameter in the request body for, seeing as the dandiset is already specified in the request path?
Should the "version" parameter in the request body be set when making the request? Should the Python method require a "version" argument?

yarikoptic · 2020-09-08T18:21:16Z

EDIT: Found it on the dandi-publish container.

just in case: http://publish.dandiarchive.org/swagger/

Is dandiset__pk supposed to be the dandiset identifier, a Girder ID, or something else?

our dandiset identifier, like anything in the API (which should have no IDs from girder), like 000001

What exactly is the "name" parameter in the request body for?

worth clearing it up with @dandi/dandiarchive developers (@brianhelba ?). May be an annotation for the release (name of the release)

What exactly is the "dandiset" parameter in the request body for, seeing as the dandiset is already specified in the request path?

yeah, that is an odd one, not sure why it is mandatory if asked for at all!? May be it tollerates an empty {}?

Should the "version" parameter in the request body be set when making the request? Should the Python method require a "version" argument?

It seems to have a default value assigned, so should be a keyword arg. In tests might want to set explicitly but should match the pattern.

jwodder · 2020-09-08T20:09:01Z

@yarikoptic It seems that the best way to perform a "publish" action at the moment is via the POST /dandi/{dandiset_id} endpoint in Girder, which makes the appropriate calls to dandi-publish. However, when I call this endpoint in the test fixture, it fails with:

girder_client.HttpError: HTTP error 400: POST http://localhost:8081/api/v1/dandi/000004
Response text: {"message": "Failed to sync 000004", "type": "rest"}

The logs for the Girder container don't contain anything informative, and the Django container logs contain:

[19:58:26] ERROR    Invalid HTTP_HOST header: 'django:8000'. You exception.py:75
                    may need to add 'django' to ALLOWED_HOSTS.                  
           WARNING  Bad Request: /api/dandisets/sync/                 log.py:222
[19:58:26] WARNING  "POST /api/dandisets/sync/?folder-id=5f57e25 basehttp.py:157
                    b71c98353d53b4783 HTTP/1.1" 400 75749

I have filed an issue with dandi-publish to support setting ALLOWED_HOSTS via an environment variable: dandi/dandi-publish#148

yarikoptic · 2020-09-08T22:33:19Z

It seems that the best way to perform a "publish" action at the moment is via the POST /dandi/{dandiset_id} endpoint in Girder, which makes the appropriate calls to dandi-publish.

IMHO we should avoid de-tour via girder and (may be picking idea from the girder's call) provide adequate record to DANDI API's /publish/ call .

jwodder · 2020-09-09T12:58:09Z

@yarikoptic Why should Girder be avoided? Based on conversations in Slack, reimplementing the Girder endpoint would involve two calls to dandi-publish, one or more calls to Girder to get a folder ID, and duplication of the Django credentials given to Girder.

yarikoptic · 2020-09-09T14:08:43Z

ok, to get us going - since DANDI /publish/ API seems to be not in a usable independently (without interactions with girder) yet (although it could/should have been), lets invoke girder endpoint for now.
But my thinking is that in the longer run we should exercise DANDI API endpoints and avoid girder end points since client should aim to talk to DANDI API not Girder. To that end filed dandi/dandi-publish#149 and we will just RF our code to talk to /publish/ directly after that issue is resolved.

yarikoptic · 2020-09-11T13:02:34Z

probably do not spend time messing with girder ATM. @satra is yet to announce that next week we will push strongly to just get unified API going so all efforts will be spent on that and all "girder" will be hidden behind API so we would be able to just talk to API to accomplish what we need to accomplish.

yarikoptic · 2020-10-06T17:53:02Z

#247 AFAIK provided alternative implementation

yarikoptic requested changes Aug 12, 2020

View reviewed changes

jwodder force-pushed the gh-182 branch from ba8767f to 95ee7cc Compare August 12, 2020 20:02

Base automatically changed from enh-download-api to master August 12, 2020 20:08

This was referenced Aug 12, 2020

Add "publish" (AKA API) service to testing docker compose setup #164

Closed

enable "retries" for HTTP communication with girder #30

Closed

download: download into a temp file + rename / clean upon failed attempt #198

Closed

yarikoptic requested changes Aug 16, 2020

View reviewed changes

jwodder force-pushed the gh-182 branch 2 times, most recently from 177c709 to a1c371a Compare August 19, 2020 13:05

jwodder and others added 9 commits September 8, 2020 11:58

Add "resume" functionality to download

1225a5e

Add DEBUG messages

a2c7660

Go back one digest when resuming downloads

283e848

Overload GirderClient.sendRestRequest() to accept all 2xx responses

10af953

Apply suggestions from code review

9ce6d66

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

Satisfy linter

56039f9

Add a fixture for creating a large NWB file

4c46f29

Fixture for uploading a large Dandiset

fa330ca

Fix fixture

f5573ba

jwodder added 3 commits September 8, 2020 11:58

Testing download resumption

4f6a4c1

Set GIRDER_LOCAL_URL

94e8d40

Fixes

122b6dc

jwodder force-pushed the gh-182 branch from eb7e0d5 to 122b6dc Compare September 8, 2020 15:58

yarikoptic mentioned this pull request Sep 9, 2020

/publish/ API endpoint should not require client's interactions with girder and /sync first. dandi/dandi-publish#149

Closed

yarikoptic added the on-hold-waiting-for-API-service TEMP label for PRs to not worry about until new API service+backend appear label Sep 14, 2020

jwodder mentioned this pull request Sep 28, 2020

Download files to temporary directory containing metadata #247

Merged

yarikoptic closed this Oct 6, 2020

Add "resume" functionality to download #189

Add "resume" functionality to download #189

Conversation

jwodder commented Aug 12, 2020 • edited by yarikoptic Loading

codecov bot commented Aug 12, 2020 • edited Loading

Codecov Report

yarikoptic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwodder commented Aug 12, 2020

yarikoptic commented Aug 12, 2020

yarikoptic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwodder commented Aug 25, 2020

yarikoptic commented Aug 25, 2020

jwodder commented Aug 25, 2020

yarikoptic commented Aug 25, 2020

jwodder commented Aug 25, 2020 • edited Loading

yarikoptic commented Aug 25, 2020

yarikoptic commented Aug 26, 2020

jwodder commented Aug 26, 2020 • edited Loading

yarikoptic commented Sep 8, 2020

jwodder commented Sep 8, 2020

yarikoptic commented Sep 8, 2020

jwodder commented Sep 8, 2020 • edited Loading

jwodder commented Sep 8, 2020

yarikoptic commented Sep 8, 2020

jwodder commented Sep 8, 2020

yarikoptic commented Sep 8, 2020

jwodder commented Sep 9, 2020

yarikoptic commented Sep 9, 2020

yarikoptic commented Sep 11, 2020

yarikoptic commented Oct 6, 2020

jwodder commented Aug 12, 2020 •

edited by yarikoptic

Loading

codecov bot commented Aug 12, 2020 •

edited

Loading

jwodder commented Aug 25, 2020 •

edited

Loading

jwodder commented Aug 26, 2020 •

edited

Loading

jwodder commented Sep 8, 2020 •

edited

Loading