Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTPS downloads for streaming datasets #1258

Merged
merged 4 commits into from
Jul 13, 2022

Conversation

ravi-mosaicml
Copy link
Contributor

@ravi-mosaicml ravi-mosaicml commented Jul 5, 2022

This PR allows for streaming URIs to be specified as HTTP/HTTPs URIs. This allows for streaming public datasets from anywhere. It also provides a workaround to boto/boto3#1200, where the dataset URI can be specified in https format rather than in s3 format to avoid having to configure unsigned credentials.

Closes https://mosaicml.atlassian.net/browse/CO-692

@ravi-mosaicml ravi-mosaicml marked this pull request as ready for review July 5, 2022 23:01
@knighton
Copy link
Contributor

knighton commented Jul 6, 2022

Assuming tested

@ravi-mosaicml
Copy link
Contributor Author

Still need to add test cases; will merge after those are in.

Copy link
Contributor

@abhi-mosaic abhi-mosaic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending tests!

@ravi-mosaicml ravi-mosaicml requested a review from a team as a code owner July 13, 2022 15:29
@ravi-mosaicml ravi-mosaicml enabled auto-merge (squash) July 13, 2022 15:32
@ravi-mosaicml ravi-mosaicml merged commit fc74621 into mosaicml:dev Jul 13, 2022
@ravi-mosaicml ravi-mosaicml deleted the ravi/https_streaming branch July 21, 2022 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants