Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated streaming args for StreamingDataset subclasses #602

Merged
merged 4 commits into from
Sep 16, 2023

Conversation

snarayan21
Copy link
Contributor

Previously, the constructor for StreamingFinetuningDataset was not updated with the correct streaming args, but the build_finetuning_dataloader function that called it had some updated streaming args. This was causing errors upon dataset instantiation. This has been addressed.

Additionally, the new streaming args batching_method and sampling_granularity have been added to StreamingFinetuningDataset and StreamingTextDataset.

boomanaiden154 and others added 3 commits September 15, 2023 12:10
Currently, when a StreamingFinetuningDataset is created using the
build_finetuning_dataloader method, a failure is returned as some
incorrect parameters are passed through to the constructor of
StreamingFinetuningDataset. This patch fixes the paramter mismatch and
adds test coverage for this case.
@dakinggg
Copy link
Collaborator

dakinggg commented Sep 15, 2023

@snarayan21 do you need to update the pin in setup.py? This stuff only works with latest streaming, right?

EDIT: Sorry, just didn't notice the setup.py change on my first skim :)

Copy link
Collaborator

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @snarayan21! Going to run regression tests just to make sure the upgrade doesn't break something unexpected.

@boomanaiden154
Copy link
Contributor

Thanks for the patch! Probably should've checked the constructors in Streaming a little bit more closely.

@dakinggg dakinggg merged commit 229ab4f into main Sep 16, 2023
9 checks passed
@dakinggg dakinggg deleted the fix-ft-streaming-modified branch October 11, 2023 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants