Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call DataModule hooks implicitly in trainer #2755

Merged
merged 21 commits into from
Aug 2, 2020

Conversation

nateraw
Copy link
Contributor

@nateraw nateraw commented Jul 29, 2020

What does this PR do?

You can now just pass datamodule to trainer.fit/trainer.test and we'll call dm.setup and dm.prepare_data if you haven't already.

dm = MyDataModule()
model = CoolSystem()
trainer = Trainer()
trainer.fit(model, dm)

Fixes #2751 and #2742

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@mergify mergify bot requested a review from a team July 29, 2020 19:57
@nateraw nateraw linked an issue Jul 29, 2020 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Jul 29, 2020

Codecov Report

Merging #2755 into master will increase coverage by 0%.
The diff coverage is 97%.

@@          Coverage Diff           @@
##           master   #2755   +/-   ##
======================================
  Coverage      91%     91%           
======================================
  Files          76      76           
  Lines        6787    6819   +32     
======================================
+ Hits         6150    6187   +37     
+ Misses        637     632    -5     

@nateraw nateraw added the ready PRs ready to be merged label Jul 29, 2020
@Borda Borda removed the ready PRs ready to be merged label Jul 29, 2020
@mergify mergify bot requested a review from a team July 29, 2020 21:51
@Borda Borda added bug Something isn't working ci Continuous Integration ready PRs ready to be merged labels Jul 29, 2020
@williamFalcon
Copy link
Contributor

needs setup with step arg

@nateraw nateraw removed the ready PRs ready to be merged label Jul 30, 2020
docs/source/datamodules.rst Show resolved Hide resolved
@@ -155,14 +234,14 @@ def prepare_data(self):
"""

@abstractmethod
def setup(self, *args, **kwargs):
def setup(self, stage: Optional[str] = None):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is def setup(self, stage: Optional[str] = None): the way we want this to look?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this to allow for people to set None as default value of stage, which will let them call dm.setup(). It will act as if you've setup both fit and test related setups if you do it this way, which won't confuse trainer when you go to call trainer.test after trainer.fit.

I documented this usage heavily in the docs.

pytorch_lightning/trainer/trainer.py Show resolved Hide resolved
@nateraw
Copy link
Contributor Author

nateraw commented Jul 30, 2020

needs setup with step arg

@williamFalcon All done and updated docs.


# If datamodule.prepare_data() has not been called yet, call it
if self.is_overridden('prepare_data', datamodule) and not datamodule.has_prepared_data:
datamodule.prepare_data()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the wrong place to call this.
this needs to be called when we call the prepare_data hook. otherwise the timing will be wrong

@mergify mergify bot requested a review from a team July 30, 2020 22:15
@nateraw nateraw changed the title Call DataModule hooks implicitly in trainer [WIP] Call DataModule hooks implicitly in trainer Jul 30, 2020
# self.dims = tuple(self.mnist_test[0][0].shape)

def train_dataloader(self):
return DataLoader(self.mnist_train, batch_size=32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add batch_size as a param?

@mergify mergify bot requested a review from a team July 30, 2020 23:51
Copy link
Contributor

@ananyahjha93 ananyahjha93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get pickle test cases to pass

@mergify mergify bot requested a review from a team July 30, 2020 23:53
@pep8speaks
Copy link

pep8speaks commented Jul 31, 2020

Hello @nateraw! Thanks for updating this PR.

Line 53:9: E116 unexpected indentation (comment)
Line 58:9: E116 unexpected indentation (comment)
Line 63:13: E116 unexpected indentation (comment)

Comment last updated at 2020-08-02 00:01:39 UTC

@mergify mergify bot requested a review from a team July 31, 2020 22:14
@mergify
Copy link
Contributor

mergify bot commented Aug 1, 2020

This pull request is now in conflict... :(

@williamFalcon williamFalcon changed the title [WIP] Call DataModule hooks implicitly in trainer Call DataModule hooks implicitly in trainer Aug 2, 2020
@williamFalcon williamFalcon merged commit 036bcea into Lightning-AI:master Aug 2, 2020
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why most of the tests in docs were converted to samples without testing?

@@ -11,33 +11,63 @@ Data preparation in PyTorch follows 5 steps:
A DataModule is simply a collection of a train_dataloader, val_dataloader(s), test_dataloader(s) along with the
matching transforms and data processing/downloads steps required.

.. code-block:: python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not test code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are a ton of unnecessary doctests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you mean by unnecessary, that there is no need for examples? Otherwise all examples shall be tested that they are aligned with actual code base...
cc: @awaelchli

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, this code block would be perfect for a doctest, and it is as simple as adding .. testcode. Even if we don't make any assertions here, Python will parse the code and run the import statements. It would help us keep the docs up-to-date with the api.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we do not need anything extra, but it checks syntax and eventually nb of passed arguments or kwargs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ci Continuous Integration
Projects
None yet
7 participants