Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleanup download utils #8273

Merged
merged 2 commits into from
Feb 15, 2024
Merged

cleanup download utils #8273

merged 2 commits into from
Feb 15, 2024

Conversation

pmeier
Copy link
Contributor

@pmeier pmeier commented Feb 14, 2024

Follow-up to #8237. We only had the _save_response_content as a separate function because it was used for the GDrive as well. Now, we can just inline it.

Copy link

pytorch-bot bot commented Feb 14, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8273

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit dabc313 with merge base f21c6bd (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pmeier , LGTM but please check my comment below

for chunk in content:
# filter out keep-alive new chunks
if not chunk:
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new changes are only equivalent to the old ones if this continue was meant to be a break. Was this meant to be a break? In other words, can there be an empty chunk and then a non-empty chunk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR they are equivalent since content now can only be one type or iterator for which the branch will never trigger.

These keep-alive chunks are b"". Content iterators coming from requests could actually contain them. However, the content iterator coming from our _urlretrieve function cannot.

The way iter works when you have two inputs is quite unintuitive. Basically waht iter(lambda: response.read(chunk_size), b"") does is it calls the lambda forever until that returns the sentinel b"". If that happens it raises a StopIteration. Meaning

for chunk in iter(lambda: response.read(chunk_size), b""):
    if not chunk:
        # this will never be reached

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer your question directly:

In other words, can there be an empty chunk and then a non-empty chunk?

No.

items = [b"foo", b"", b"bar"]
idx = 0


def fn():
    global idx

    item = items[idx]
    idx += 1
    return item


for chunk in iter(fn, b""):
    print(chunk)
b'foo'

@pmeier pmeier merged commit b1123cf into pytorch:main Feb 15, 2024
82 of 83 checks passed
@pmeier pmeier deleted the download-utils-cleanup branch February 15, 2024 15:37
facebook-github-bot pushed a commit that referenced this pull request Mar 20, 2024
Reviewed By: vmoens

Differential Revision: D55062790

fbshipit-source-id: d85114aa859d025a467642ee53197ff1d88b1d1d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants