Add useful errors when model is not configured correctly #1199

SkafteNicki · 2020-03-20T15:06:43Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?
If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

This PR adds checks, that secure that the users model is correctly configured before training. This includes:

Checking that training_step has been overridden
Checking that training_dataloader has been overridden
Warning if configure_optimizers has not been overridden, will tell user that program is using default optimizer (adam with lr=0.0001)
Error if a validation_dataloader is overridden but no validation_step is defined (and vise verse)
Error if a test_dataloader is overriden but no test_step is defined (and vise verse)
Warning if validation_dataloader, validation_step is overridden but no validation_epoch_end
Warning if test_dataloader, test_step is overridden but no test_epoch_end

The most fundamental change is the requirement of validation_step and test_step when there respective dataloaders are defined. This will probably not be backward compatible with some users code.

pep8speaks · 2020-03-20T17:39:18Z

Hello @SkafteNicki! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-04-02 14:38:44 UTC

…onfig

mergify · 2020-03-24T18:53:44Z

This pull request is now in conflict... :(

Borda

I like VERY MUCH this warning check grouping, just pls have look ad my formatting proposals
(I made suggestion just to the first one but it allays to all :])

pytorch_lightning/trainer/trainer.py

Borda · 2020-03-25T09:07:58Z

@neggert I guess you already were solving the DDP pickle issue, right?
tests/test_gpu_models.py::test_ddp_all_dataloaders_passed_to_fit
tells:

obj = <SpawnProcess(SpawnProcess-9, initial)> | 811s
-- | --
920 | file = <_io.BytesIO object at 0x7fa73c5d26d0>, protocol = None | 811s
921 |   | 811s
922 | def dump(obj, file, protocol=None): | 811s
923 | '''Replacement for pickle.dump() using ForkingPickler.''' | 811s
924 | >       ForkingPickler(file, protocol).dump(obj) | 811s
925 | E       TypeError: can't pickle code objects

@SkafteNicki may you pls resolve conflist in chnagelog?

SkafteNicki · 2020-03-26T09:58:24Z

@Borda I update code with your recommendations and have solved the conflict. Only the pickle issues seems to be blocking a merge.

jeremyjordan

this is great @SkafteNicki , thanks for your work on this! can you add tests to ensure that the exceptions are raised when the model is misconfigured?

SkafteNicki · 2020-03-29T12:11:14Z

@jeremyjordan I have added the test that you requested

ethanwharris

This looks good :) - minor weirdness in the CHANGELOG (see comment)

@SkafteNicki @Borda @PyTorchLightning/core-contributors Wondering if the required methods should be made proper abstract methods in LightningModule so that IDEs will prompt to implement them?

CHANGELOG.md

jeremyjordan · 2020-03-29T12:47:00Z

some related work going on in #1279

mergify · 2020-03-29T13:08:04Z

This pull request is now in conflict... :(

mergify · 2020-03-30T22:37:49Z

This pull request is now in conflict... :(

awaelchli · 2020-03-31T01:22:24Z

Awesome!

Borda · 2020-03-31T06:56:40Z

@SkafteNicki nice job, could you pls resolve conflicts...

SkafteNicki · 2020-03-31T10:26:03Z

@Borda conflicts are solved now, some (unrelated) tests are still failing...

Borda · 2020-03-31T11:27:15Z

@SkafteNicki I have restarted the GitHub, but the GPU is failing because of

def dump(obj, file, protocol=None): | 801s
-- | --
1709 | '''Replacement for pickle.dump() using ForkingPickler.''' | 801s
1710 | >       ForkingPickler(file, protocol).dump(obj) | 801s
1711 | E       TypeError: can't pickle code objects

I thought we were fixing it... @neggert?

mergify · 2020-03-31T12:59:39Z

This pull request is now in conflict... :(

williamFalcon

Awesome, apparently we duplicated with #1317

We need to still allow a LightningModule to function like a regular PyTorch module. So, instead of raising errors, we need to give warnings.

Want to merge my changes into this PR instead?

The main other thing in my PR is that configure_optimizers no longer returns a default Adam.

Borda · 2020-03-31T22:48:02Z

could we merge this one and then Will's as followup?

SkafteNicki · 2020-04-01T12:24:26Z

As the checks implemented in this PR is only called within Trainer.fit() I think that they should indeed throw an error instead of a warning. This means that the user can freely use LightningModules as any other Pytorch modules until they try to make use of lightnings trainer functionality, in which we enforce some structure.

If you want to merge #1317 into this, fine by me. I will update configure_optimizers error/warning afterwards.

mergify · 2020-04-02T12:56:28Z

This pull request is now in conflict... :(

williamFalcon · 2020-04-02T15:53:26Z

@SkafteNicki awesome!

Borda · 2020-04-02T21:32:19Z

@williamFalcon @SkafteNicki there is still a bug and now it is in master

def dump(obj, file, protocol=None): | 1272s
-- | --
1712 | '''Replacement for pickle.dump() using ForkingPickler.''' | 1272s
1713 | >       ForkingPickler(file, protocol).dump(obj) | 1272s
1714 | E       TypeError: can't pickle code objects

http://35.192.60.23/PyTorchLightning/pytorch-lightning/954/1/2

williamFalcon · 2020-04-02T21:40:05Z

probably that code call

neggert · 2020-04-02T21:42:38Z

Yeah test_ddp_all_dataloaders_passed_to_fit is failing. I'd imagine it's this line:

self.__code__ = self.__call__.__code__

Seems that code object isn't pickleable. We need everything to be pickleable for DDP to work.

…I#1199) * add check_model_configuration method * trying to fix errors * trying to fix tests * added test_epoch_end to lightning template * fix tests * fix new test after rebase * fix spelling * added more checks * updated formating * added tests * fixed CHANGELOG * Apply suggestions from code review * move test to new module * change check on configure_optimizers Co-authored-by: Nicki Skafte <nugginea@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

tbachlechner · 2020-05-14T05:12:10Z

It appears that training_dataloader does not exist, I think it should read train_dataloader? Another question is naming convention consistency with training_step.

SkafteNicki added 2 commits March 20, 2020 14:48

add check_model_configuration method

6e1a87c

trying to fix errors

12dfc7e

Nicki Skafte added 5 commits March 23, 2020 10:50

trying to fix tests

45d9e0e

added test_epoch_end to lightning template

ac60949

fix tests

1c64dd8

Merge remote-tracking branch 'upstream/master' into add_check_model_c…

99bae58

…onfig

fix new test after rebase

00c686d

SkafteNicki marked this pull request as ready for review March 23, 2020 14:03

fix spelling

210f92b

SkafteNicki changed the title ~~add check_model_configuration method~~ Add useful errors when model is not configured correctly Mar 24, 2020

added more checks

3ee083f

Borda added the feature Is an improvement or enhancement label Mar 25, 2020

Borda added this to the 0.7.2 milestone Mar 25, 2020

Borda requested review from tullie and a team March 25, 2020 08:52

Borda reviewed Mar 25, 2020

View reviewed changes

pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved

pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved

Nicki Skafte added 2 commits March 25, 2020 17:12

updated formating

e7abda1

rebase

6b5dbb2

Borda assigned neggert Mar 26, 2020

jeremyjordan reviewed Mar 27, 2020

View reviewed changes

added tests

7c1eee1

ethanwharris approved these changes Mar 29, 2020

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

fixed CHANGELOG

2fd5abc

Borda and others added 3 commits March 29, 2020 21:55

Apply suggestions from code review

6653a54

rebase

63802d8

move test to new module

d9a22e9

mergify bot requested a review from a team March 30, 2020 22:33

justusschock approved these changes Mar 31, 2020

View reviewed changes

rebase

15979db

jeremyjordan mentioned this pull request Mar 31, 2020

added warnings to unimplemented methods #1317

Merged

williamFalcon requested changes Mar 31, 2020

View reviewed changes

Nicki Skafte added 2 commits April 1, 2020 14:29

rebase

39e0fc5

change check on configure_optimizers

8b6b0cb

rebase

5f294fb

williamFalcon merged commit 2912239 into Lightning-AI:master Apr 2, 2020

SkafteNicki deleted the add_check_model_config branch April 21, 2020 13:52

Borda modified the milestones: v0.7., v0.7.x Apr 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add useful errors when model is not configured correctly #1199

Add useful errors when model is not configured correctly #1199

SkafteNicki commented Mar 20, 2020 •

edited

Loading

pep8speaks commented Mar 20, 2020 •

edited

Loading

mergify bot commented Mar 24, 2020

Borda left a comment

Borda commented Mar 25, 2020

SkafteNicki commented Mar 26, 2020

jeremyjordan left a comment

SkafteNicki commented Mar 29, 2020

ethanwharris left a comment

jeremyjordan commented Mar 29, 2020

mergify bot commented Mar 29, 2020

mergify bot commented Mar 30, 2020

awaelchli commented Mar 31, 2020

Borda commented Mar 31, 2020

SkafteNicki commented Mar 31, 2020

Borda commented Mar 31, 2020

mergify bot commented Mar 31, 2020

williamFalcon left a comment •

edited by Borda

Loading

Borda commented Mar 31, 2020

SkafteNicki commented Apr 1, 2020

mergify bot commented Apr 2, 2020

williamFalcon commented Apr 2, 2020

Borda commented Apr 2, 2020

williamFalcon commented Apr 2, 2020

neggert commented Apr 2, 2020

tbachlechner commented May 14, 2020

Add useful errors when model is not configured correctly #1199

Add useful errors when model is not configured correctly #1199

Conversation

SkafteNicki commented Mar 20, 2020 • edited Loading

Before submitting

What does this PR do?

pep8speaks commented Mar 20, 2020 • edited Loading

Comment last updated at 2020-04-02 14:38:44 UTC

mergify bot commented Mar 24, 2020

Borda left a comment

Choose a reason for hiding this comment

Borda commented Mar 25, 2020

SkafteNicki commented Mar 26, 2020

jeremyjordan left a comment

Choose a reason for hiding this comment

SkafteNicki commented Mar 29, 2020

ethanwharris left a comment

Choose a reason for hiding this comment

jeremyjordan commented Mar 29, 2020

mergify bot commented Mar 29, 2020

mergify bot commented Mar 30, 2020

awaelchli commented Mar 31, 2020

Borda commented Mar 31, 2020

SkafteNicki commented Mar 31, 2020

Borda commented Mar 31, 2020

mergify bot commented Mar 31, 2020

williamFalcon left a comment • edited by Borda Loading

Choose a reason for hiding this comment

Borda commented Mar 31, 2020

SkafteNicki commented Apr 1, 2020

mergify bot commented Apr 2, 2020

williamFalcon commented Apr 2, 2020

Borda commented Apr 2, 2020

williamFalcon commented Apr 2, 2020

neggert commented Apr 2, 2020

tbachlechner commented May 14, 2020

SkafteNicki commented Mar 20, 2020 •

edited

Loading

pep8speaks commented Mar 20, 2020 •

edited

Loading

williamFalcon left a comment •

edited by Borda

Loading