support num_sanity_val_steps=-1 #2246

awaelchli · 2020-06-18T21:52:43Z

What does this PR do?

Fixes #1715
Trainer(num_sanity_val_steps=-1) is now possible and will run all val dataloaders in full.

TODO:

is limit_val_batches supposed to influence sanity val steps? ok, clarified by william

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you create a separate PR for every change.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

codecov · 2020-06-20T11:24:49Z

Codecov Report

Merging #2246 into master will decrease coverage by 0%.
The diff coverage is 92%.

@@          Coverage Diff           @@
##           master   #2246   +/-   ##
======================================
- Coverage      92%     92%   -0%     
======================================
  Files          74      74           
  Lines        6316    6322    +6     
======================================
+ Hits         5786    5791    +5     
- Misses        530     531    +1

pep8speaks · 2020-06-22T21:02:56Z

Hello @awaelchli! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-07-23 10:48:02 UTC

awaelchli · 2020-06-22T22:25:23Z

@williamFalcon is limit_val_batches=0 supposed to influence num_sanity_val_steps, i.e., should
Trainer(num_sanity_val_steps=5, limit_val_batches=0) run 5 checks or none? Currently it runs none.

mergify · 2020-06-23T16:08:40Z

This pull request is now in conflict... :(

Borda

LGTM 🚀

tests/trainer/test_trainer.py

Borda · 2020-06-26T23:06:11Z

is limit_val_batches supposed to influence sanity val steps?

it depends if you shuffle them... @williamFalcon?

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

mergify · 2020-06-27T20:37:57Z

This pull request is now in conflict... :(

awaelchli · 2020-06-29T22:17:22Z

@williamFalcon Ok, I hope I understood you correctly. I updated the PR. Here are some examples of how it currently is:

# runs exactly 5 sanity val steps
Trainer(num_sanity_val_steps=5)

# runs exactly 5 sanity val steps
Trainer(num_sanity_val_steps=5, limit_val_batches=3)

# runs 0 sanity val steps
Trainer(num_sanity_val_steps=5, limit_val_batches=0)

# runs 0 sanity val steps (but run fast_dev check batches as usual)
Trainer(num_sanity_val_steps=5, limit_val_batches=3, fast_dev_run=True)

We can also substitute 5 with -1 of course.
Does this make sense?

williamFalcon · 2020-06-29T22:23:26Z

almost perfect!

fast_dev_run always runs 1 of each and completes.

awaelchli · 2020-06-29T22:25:48Z

yes exactly, I did not change that. I just did not conceptually count it as "sanity check" because in the code it follows a different path. (in fact, fast_dev_run sets num_sanity_val_steps=0 in Trainer init)

# runs 0 sanity steps, 1 val step and 1 train step.
so Trainer(num_sanity_val_steps=5, fast_dev_run=True)

mergify · 2020-07-01T02:46:54Z

This pull request is now in conflict... :(

mergify · 2020-07-01T11:54:00Z

This pull request is now in conflict... :(

awaelchli · 2020-07-11T23:09:13Z

@williamFalcon kindly requesting your review :)

mergify · 2020-07-14T18:23:26Z

This pull request is now in conflict... :(

williamFalcon · 2020-07-23T00:31:40Z

@awaelchli this looks great. Can you rebase so we can merge?

williamFalcon · 2020-07-23T00:32:22Z

pytorch_lightning/callbacks/progress.py

@@ -302,7 +302,7 @@ def init_test_tqdm(self) -> tqdm:
    def on_sanity_check_start(self, trainer, pl_module):
        super().on_sanity_check_start(trainer, pl_module)
        self.val_progress_bar = self.init_sanity_tqdm()
-        self.val_progress_bar.total = trainer.num_sanity_val_steps * len(trainer.val_dataloaders)
+        self.val_progress_bar.total = convert_inf(trainer.num_sanity_val_steps * len(trainer.val_dataloaders))


what is this?

tqdm does not understand float("inf"), so we have to convert it to None in the case num_sanity_val_steps=inf or dataloader has inf length.

right but where is this imported?

oh i see... just kind of stuck at the bottom lol

awaelchli · 2020-07-23T01:28:47Z

@williamFalcon rebase done.

Borda · 2020-07-23T09:38:08Z

pytorch_lightning/trainer/trainer.py

+    @property
+    def disable_validation(self) -> bool:
+        """ Check if validation is disabled during training. """
+        disable_validation = not (self.is_overridden('validation_step') and self.limit_val_batches > 0) \


when all are not, would it be easier to write enable_validation and negate it here? (vise verso to this state)

williamFalcon · 2020-07-23T10:51:30Z

@awaelchli why does this drop coverage so much? maybe the logical part was wrong?

awaelchli · 2020-07-23T11:04:10Z

@williamFalcon it is because it fails to upload the coverage, http error.

mergify bot requested a review from a team June 18, 2020 21:53

awaelchli added the feature Is an improvement or enhancement label Jun 18, 2020

awaelchli added 2 commits June 20, 2020 12:07

support sanity_val_step=-1

e1e9680

fix list size

d375e2a

awaelchli force-pushed the feat/all_sanity_steps branch from 8b50ecf to d375e2a Compare June 20, 2020 10:13

awaelchli added 2 commits June 20, 2020 12:21

simplification

0be420c

simplify

66f4e91

add test for num_sanity_val_steps=-1

d7b56e9

awaelchli added 4 commits June 22, 2020 23:05

update test

bd47b70

update docs

dd3862d

extend tests to multiple dataloaders

a15fd8d

changelog

cef85c3

awaelchli changed the title ~~support sanity_val_step=-1~~ support num_sanity_val_steps=-1 Jun 22, 2020

awaelchli marked this pull request as ready for review June 22, 2020 23:18

Merge branch 'master' into feature/sanity-val-full-data

e644a29

awaelchli requested review from Borda and williamFalcon June 23, 2020 23:26

Borda approved these changes Jun 26, 2020

View reviewed changes

tests/trainer/test_trainer.py Outdated Show resolved Hide resolved

Borda added the ready PRs ready to be merged label Jun 26, 2020

Borda requested review from MattPainter01, neggert and a team June 26, 2020 23:05

Borda removed the ready PRs ready to be merged label Jun 26, 2020

Update tests/trainer/test_trainer.py

e7c2a8d

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

awaelchli added 3 commits June 29, 2020 21:57

Merge branch 'master' into feature/sanity-val-full-data

a26fe29

improve test

231c0c6

refactor the sanity check decision

6dc3d8b

Merge branch 'master' into feature/sanity-val-full-data

b7a973f

Merge branch 'master' into feature/sanity-val-full-data

83c03df

awaelchli requested a review from Borda July 11, 2020 22:45

williamFalcon added the allowed_pre_1.0 label Jul 23, 2020

williamFalcon reviewed Jul 23, 2020

View reviewed changes

mergify bot requested a review from a team July 23, 2020 00:32

awaelchli added 2 commits July 23, 2020 03:07

Merge branch 'master' into feature/sanity-val-full-data

637ba3a

fix merge

8713ccd

Borda reviewed Jul 23, 2020

View reviewed changes

mergify bot requested a review from a team July 23, 2020 09:39

Borda self-requested a review July 23, 2020 09:39

Update trainer.py

ed62155

williamFalcon merged commit 1e68968 into master Jul 23, 2020

awaelchli deleted the feat/all_sanity_steps branch July 23, 2020 11:09

awaelchli mentioned this pull request Aug 9, 2020

Int num_sanity_val_steps is always replaced by float limit_val_batches #2882

Closed

dulayjm mentioned this pull request Aug 25, 2020

Validation Step for Epoch Clarified #3164

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support num_sanity_val_steps=-1 #2246

support num_sanity_val_steps=-1 #2246

awaelchli commented Jun 18, 2020 •

edited

Loading

codecov bot commented Jun 20, 2020 •

edited

Loading

pep8speaks commented Jun 22, 2020 •

edited

Loading

awaelchli commented Jun 22, 2020

mergify bot commented Jun 23, 2020

Borda left a comment

Borda commented Jun 26, 2020

mergify bot commented Jun 27, 2020

awaelchli commented Jun 29, 2020 •

edited

Loading

williamFalcon commented Jun 29, 2020 •

edited

Loading

awaelchli commented Jun 29, 2020 •

edited

Loading

mergify bot commented Jul 1, 2020

mergify bot commented Jul 1, 2020

awaelchli commented Jul 11, 2020

mergify bot commented Jul 14, 2020

williamFalcon commented Jul 23, 2020

williamFalcon Jul 23, 2020

awaelchli Jul 23, 2020

williamFalcon Jul 23, 2020

williamFalcon Jul 23, 2020

awaelchli commented Jul 23, 2020

Borda Jul 23, 2020

williamFalcon Jul 23, 2020

williamFalcon commented Jul 23, 2020

awaelchli commented Jul 23, 2020 •

edited

Loading

support num_sanity_val_steps=-1 #2246

support num_sanity_val_steps=-1 #2246

Conversation

awaelchli commented Jun 18, 2020 • edited Loading

What does this PR do?

Before submitting

PR review

Did you have fun?

codecov bot commented Jun 20, 2020 • edited Loading

Codecov Report

pep8speaks commented Jun 22, 2020 • edited Loading

Comment last updated at 2020-07-23 10:48:02 UTC

awaelchli commented Jun 22, 2020

mergify bot commented Jun 23, 2020

Borda left a comment

Choose a reason for hiding this comment

Borda commented Jun 26, 2020

mergify bot commented Jun 27, 2020

awaelchli commented Jun 29, 2020 • edited Loading

williamFalcon commented Jun 29, 2020 • edited Loading

awaelchli commented Jun 29, 2020 • edited Loading

mergify bot commented Jul 1, 2020

mergify bot commented Jul 1, 2020

awaelchli commented Jul 11, 2020

mergify bot commented Jul 14, 2020

williamFalcon commented Jul 23, 2020

williamFalcon Jul 23, 2020

Choose a reason for hiding this comment

awaelchli Jul 23, 2020

Choose a reason for hiding this comment

williamFalcon Jul 23, 2020

Choose a reason for hiding this comment

williamFalcon Jul 23, 2020

Choose a reason for hiding this comment

awaelchli commented Jul 23, 2020

Borda Jul 23, 2020

Choose a reason for hiding this comment

williamFalcon Jul 23, 2020

Choose a reason for hiding this comment

williamFalcon commented Jul 23, 2020

awaelchli commented Jul 23, 2020 • edited Loading

awaelchli commented Jun 18, 2020 •

edited

Loading

codecov bot commented Jun 20, 2020 •

edited

Loading

pep8speaks commented Jun 22, 2020 •

edited

Loading

awaelchli commented Jun 29, 2020 •

edited

Loading

williamFalcon commented Jun 29, 2020 •

edited

Loading

awaelchli commented Jun 29, 2020 •

edited

Loading

awaelchli commented Jul 23, 2020 •

edited

Loading