Refactor dataloading #955

ethanwharris · 2020-02-26T17:59:25Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
Did you read the contributor guideline?
Did you make sure to update the docs?
Did you write any new necessary tests?
If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Fixes #953 Fixes #840 Fixes #698

Refactored calls to reset_train_dataloader and reset_val_dataloader to only happen when needed
Slightly changes the logic for disable validation to ignore the validation_dataloader length
Removed default unpacking of dataloader and adding of RandomSampler - see comment in refactor len(datasets) call. #953
Changed Exception order so that MisconfigurationException will be called and given when using IterableDataset
Removed dependency on Dataloader.dataset following Handle abstract loader that doesn't have a dataset member #840
num_training_batches =float('inf') is now the default when train dataloader doesn't have __len__ (in addition to when using an IterableDataset)
Added test for training using an iterable

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

Darktex · 2020-02-26T20:02:54Z

pytorch_lightning/trainer/training_loop.py

@@ -271,7 +271,7 @@ def is_function_implemented(self, m):
        pass

    @abstractmethod
-    def is_iterable_dataloader(self, dataloader):
+    def is_infinite_dataloader(self, dataloader):


Does this need to be considered infinite? Normally, IterableDatasets are finite, we just don't know how long they are the first epoch. Another way of doing this would be to set it to infinite (or -1, or whatever placeholder value works best) and keep counting how many steps we did the first epoch. Once we start the second epoch, we can print a bar and all since we know the length will not change

Yeah, thinking about it that should totally just be has_len or similar. No problem with the idea of keeping a record of the number of steps - although I would probably opt for that to be done in a seperate PR (lile when we add IterableDataset support for val and test) - but I'm happy to add that now if desired

williamFalcon · 2020-02-26T21:43:23Z

@ethanwharris nice job.

I'll merge this, then fix the TPU stuff then you can add what @Darktex suggested!
Running GPU tests now

* Refactor dataloading * Refactor dataloading * Refactor dataloading * Add shuffle to test

ethanwharris added 4 commits February 26, 2020 17:34

Refactor dataloading

ed104f5

Refactor dataloading

7e7532b

Refactor dataloading

ea46e10

Add shuffle to test

eaa663d

ethanwharris requested a review from williamFalcon February 26, 2020 17:59

This was referenced Feb 26, 2020

refactor len(datasets) call. #953

Closed

[blocked by #789] Fix/dataset check [wip] #883

Closed

Borda added bug Something isn't working feature Is an improvement or enhancement labels Feb 26, 2020

Darktex reviewed Feb 26, 2020

View reviewed changes

williamFalcon merged commit b2e9607 into Lightning-AI:master Feb 26, 2020

Borda added this to the 0.6.1 milestone Feb 27, 2020

tullie pushed a commit to tullie/pytorch-lightning that referenced this pull request Apr 3, 2020

Refactor dataloading (Lightning-AI#955)

3a0ffb8

* Refactor dataloading * Refactor dataloading * Refactor dataloading * Add shuffle to test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor dataloading #955

Refactor dataloading #955

ethanwharris commented Feb 26, 2020 •

edited

Loading

Darktex Feb 26, 2020

ethanwharris Feb 26, 2020

williamFalcon commented Feb 26, 2020

Refactor dataloading #955

Refactor dataloading #955

Conversation

ethanwharris commented Feb 26, 2020 • edited Loading

Before submitting

What does this PR do?

PR review

Did you have fun?

Darktex Feb 26, 2020

Choose a reason for hiding this comment

ethanwharris Feb 26, 2020

Choose a reason for hiding this comment

williamFalcon commented Feb 26, 2020

ethanwharris commented Feb 26, 2020 •

edited

Loading