-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support limit_mode_batches (int) for infinite dataloader #2787
Conversation
raise MisconfigurationException( | ||
'When using an infinite DataLoader (e.g. with an IterableDataset' | ||
f' or when DataLoader does not implement `__len__`) for `limit_{mode}_batches`,' | ||
f' `Trainer(limit_{mode}_batches)` must be `0.0`, `1.0` or `int`') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can limit_mode_batches = 0.0 or 1.0
for infinite dataloader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I remember, a value of 0.0
should just run nothing and a value of 1.0
should run until the dataloader stops. If it is an actually infinite dataloader then it would just never end, so perhaps this should be written differently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, as of now I just moved some if-else statements and with 0.0 it was neither working before nor it is working now. I can change it as per requirement. But just like you said I agree it should work with both 0.0
and int
and otherwise raise this exception with an updated statement here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and not just 1.0 but with every 0.0 < limit_mode_batches <= 1.0
it should raise this exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@awaelchli what do you suggest here? I'll update and add tests accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the same for val_check_interval=1.0 (but we don't print a warning), where we distinguish between 1.0 (float, 100% of epoch) and 1 (int, batch count). I think it is reasonable to differentiate using the type and the docs make it very clear with the examples. Regarding this exception message, I think it should be changed to "When using an IterableDataset (e.g. of infinite size ...". Or just not refer to inf dataset as Ethan suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and for non-iterable datasets should 0.0 be supported with inf dataloader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and separate exceptions for Iterable and inf dataset? For Iterable 1.0
is supported and for inf dataloader 0.0
and int
is supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you clarify. what would that look like? I don't know of any inf datasets that are not of type IterableDataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither do I, never seen one. To he honest, I didn't even know about IterableDataset
before this point 😅. Well I guess, I'll revert recent commits and change the exception message as suggested by Ethan.
Codecov Report
@@ Coverage Diff @@
## master #2787 +/- ##
=======================================
+ Coverage 86% 90% +4%
=======================================
Files 78 78
Lines 7001 7001
=======================================
+ Hits 6033 6317 +284
+ Misses 968 684 -284 |
d0f0f32
to
acbe573
Compare
Hello @rohitgr7! Thanks for updating this PR. There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-08-05 16:40:43 UTC |
This pull request is now in conflict... :( |
139139f
to
40253e1
Compare
Should I add supporting docs here? https://pytorch-lightning.readthedocs.io/en/stable/sequences.html?highlight=IterableDataset#iterable-datasets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, also the tests 🚀
This pull request is now in conflict... :( |
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
df71791
to
8322e1a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, looks good to me :) can clean up the last elif slightly
Great job! =) |
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
if isinstance(self.limit_train_batches, int) or self.limit_train_batches == 0.0: | ||
self.num_training_batches = min(self.num_training_batches, int(self.limit_train_batches)) | ||
elif self.num_training_batches != float('inf'): | ||
self.num_training_batches = int(self.num_training_batches * self.limit_train_batches) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can it be also 5.1
meaning 510% ?
cc: @PyTorchLightning/core-contributors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Borda I don't think so. We iterate like this:
# run epoch
for batch_idx, (batch, is_last_batch) in self.profiler.profile_iterable(
enumerate(_with_is_last(train_dataloader)), "get_train_batch"
):
# stop epoch if we limited the number of training batches
if batch_idx >= self.num_training_batches:
break
so if the loader is exhausted before, this would trigger an stop iteration meaning that the condition will never be True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still, it would be cleaner to have there val = min(val, 1.0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can do _check_batch_limits there to avoid floating values > 1.0
Great job! =) |
@Borda can we unmerge it there is some irrelevant code I just deleted locally and need to push it? or should I create a new pr with updated changes? |
yes, create a new PR and refer that it it fix to this one, lest get it done asap |
…)" This reverts commit de9c9f0.
What does this PR do?
Fixes #2649
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃