download: download into a temp file + rename / clean upon failed attempt #198

yarikoptic · 2020-08-13T21:06:22Z

As of 0.6.0 implementation download downloads directly into the target location.

In common scenarios it is better to download into some temporary location and then move into the target location upon successful completion. That avoids users ending up with partial downloads which represent broken .nwb files without knowing that they are partial downloads. Similar strategy is used e.g. by chrome browser and girder client which we stopped used for download (only to initiate streamed download).

Unlike girder_client we do not want to download to a TMPDIR folder, since file could be large and /tmp could be lacking that much space and transfer into target location could entail additional delay. I think the best would be to download into a temp file in the target location folder but with some prefix or suffix. E.g. for file a/blah.nwb, keep original one (if present) file (until renaming "into" it) and download into a/blah.nwb-dandidownload to be renamed into a/blah.nwb upon successful completion.

What to do if download is interrupted (my preferences are emphasized). We could

remove partial download: but that might not happen if process is hard killed/power went down, so even if implemented we would need logic (below) to deal with them found on the drive
keep partial download: could be beneficial to e.g. establish resumed download, so I think we should proceed this way

At the beginning of the download, what to do if a partial download found:

that could mean that there is a 2nd dandi client running ATM for the same dandiset. I think that the easiest would be for a download to establish a lock file, e.g. using fasteners library, to reside probably along side in blah.nwb.lck (well, there is write_locked helpers so may be a separate lck file is not even needed)
easiest -- just remove and start a new: since we will be dealing with large files -- could be very wasteful/undesired
with Add "resume" functionality to download #189 we would be able to resume download. BUT before resuming, it would be necessary to establish first that the file on the remote end is still the same as original (interrupted) download. For that reason, while locking the file for the download ideally we should store somewhere (may be filename as in -dandidownload-<CHECKSUM>) checksum we obtain from the metadata? If no checksum was "recorded" - download from the start. If there was checksum, and matches what is about to be downloaded, just roll back a chunk and try to resume... if resume fails (due to read in "check" chunk differs as in Add "resume" functionality to download #189) - start from the beginning.

But may be if we rely on checksum we should just download into some .dandi/tmp/<CHECKSUM> and lock there? What do you think @jwodder ?

The text was updated successfully, but these errors were encountered:

jwodder · 2020-08-13T21:14:23Z

We could do what Safari and Firefox do: When downloading to a/blah.nwb, create a directory a/blah.nwb.download containing the output file plus a file of metadata about the download (checksum etc.). If we need a lockfile, we can put it in the directory as well. When the download is finished, move the file to its final location and delete the rest of the directory.

yarikoptic · 2020-08-20T13:47:54Z

Sounds good @jwodder !

yarikoptic · 2020-09-14T20:53:53Z

Please implement this one for current master. This logic is agnostic of what is underlying service (API vs girder) so we should be good even after going API all the way.

jwodder · 2020-09-16T16:13:21Z

What should happen if the downloaded file already exists (in its final location, not in a special .download folder) at the start of a download?
What should happen if another process already has a lock on the lockfile?

yarikoptic · 2020-09-16T16:36:51Z

What should happen if the downloaded file already exists (in its final location, not in a special .download folder) at the start of a download?

download into .download folder, when ready to replace existing ones - unlink existing, and move new download into its place

What should happen if another process already has a lock on the lockfile?

raise an exception. We already have LockingError which we use in girder related code, I think it is ok to reuse it for now - later girder code would go away.

jwodder · 2020-09-23T15:18:37Z

Exactly what metadata should be stored in the directory and checked when resuming download? Just the checksum for the complete file?

yarikoptic · 2020-09-25T18:55:21Z

Storing a checksum should be sufficient, but this issue PR does not necessarily should address resumed downloads (although could if only for API calls/downloader, girder will be gone anyways).

Download files to temporary directory containing metadata

yarikoptic assigned jwodder Aug 13, 2020

yarikoptic mentioned this issue Aug 20, 2020

Add "resume" functionality to download #189

Closed

jwodder mentioned this issue Sep 25, 2020

Download files to temporary directory containing metadata #247

Merged

yarikoptic closed this as completed in #247 Oct 2, 2020

yarikoptic added a commit that referenced this issue Oct 2, 2020

Merge pull request #247 from dandi/gh-198

46237b0

Download files to temporary directory containing metadata

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

download: download into a temp file + rename / clean upon failed attempt #198

download: download into a temp file + rename / clean upon failed attempt #198

yarikoptic commented Aug 13, 2020

jwodder commented Aug 13, 2020

yarikoptic commented Aug 20, 2020

yarikoptic commented Sep 14, 2020

jwodder commented Sep 16, 2020

yarikoptic commented Sep 16, 2020

jwodder commented Sep 23, 2020

yarikoptic commented Sep 25, 2020

download: download into a temp file + rename / clean upon failed attempt #198

download: download into a temp file + rename / clean upon failed attempt #198

Comments

yarikoptic commented Aug 13, 2020

jwodder commented Aug 13, 2020

yarikoptic commented Aug 20, 2020

yarikoptic commented Sep 14, 2020

jwodder commented Sep 16, 2020

yarikoptic commented Sep 16, 2020

jwodder commented Sep 23, 2020

yarikoptic commented Sep 25, 2020