Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Fix the progress bar for the sanity check #2892

Conversation

manipopopo
Copy link
Contributor

@manipopopo manipopopo commented Aug 9, 2020

The original progress bar will always show trainer.num_sanity_val_steps as val_progress_bar.total even if the length of the validation DataLoader is less than trainer.num_sanity_val_steps.

The pytorch_lightning.trainer.data_loading._has_len is changed to a public function has_len, which is called by pytorch_lightning.callbacks.progress.ProgressBar.

Import pytorch_lightning.trainer.data_loading from pytorch_lightning.callbacks.progress will lead to circular imports.
Maybe we could move pytorch_lightning.trainer.data_loading._has_len to other place.

Or we could save the sizes of validation (and train) DataLoaders as members of Trainer, which may be accessed by pytorch_lightning.callbacks.progress.ProgressBar.

What does this PR do?

Fixes #2891

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

@pep8speaks
Copy link

pep8speaks commented Aug 9, 2020

Hello @manipopopo! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-08-14 17:35:58 UTC

@mergify mergify bot requested a review from a team August 9, 2020 10:18
@ananyahjha93 ananyahjha93 self-requested a review August 9, 2020 11:08
@codecov
Copy link

codecov bot commented Aug 9, 2020

Codecov Report

Merging #2892 into master will not change coverage.
The diff coverage is n/a.

@@          Coverage Diff           @@
##           master   #2892   +/-   ##
======================================
  Coverage      85%     85%           
======================================
  Files          82      82           
  Lines        7719    7719           
======================================
  Hits         6550    6550           
  Misses       1169    1169           

@awaelchli awaelchli self-requested a review August 10, 2020 16:28
@Borda Borda added the bug Something isn't working label Aug 11, 2020
pytorch_lightning/callbacks/progress.py Outdated Show resolved Hide resolved
@mergify mergify bot requested a review from a team August 11, 2020 09:22
tests/callbacks/test_progress_bar.py Outdated Show resolved Hide resolved
@mergify mergify bot requested a review from a team August 11, 2020 13:34
@mergify
Copy link
Contributor

mergify bot commented Aug 11, 2020

This pull request is now in conflict... :(

tests/callbacks/test_progress_bar.py Outdated Show resolved Hide resolved
tests/utilities/test_data.py Outdated Show resolved Hide resolved
@mergify mergify bot requested a review from a team August 11, 2020 14:12
Copy link
Member

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good fix!

@mergify mergify bot requested a review from a team August 11, 2020 14:29
Comment on lines +297 to +299
self.val_progress_bar.total = sum(
min(trainer.num_sanity_val_steps, len(d) if has_len(d) else float('inf')) for d in trainer.val_dataloaders
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@awaelchli num_sanity_val_steps should be independent of limit_val_batches(float)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if num_sanity_val_steps=2, len(val_dataloader)=10 and limit_val_batches=0.1, should it run for 2 val_steps or 1?

Copy link
Member

@awaelchli awaelchli Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this relevant here? I thought this pr is just about displaying the num_sanity steps that the trainer returns.
if limit_val_batches is used, it should just truncate the sanity steps if needed, no? This should happen in the trainer I think.

Copy link
Contributor

@rohitgr7 rohitgr7 Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it still has some issues with limit_val_batches and I think a better fix would be to set up num_sanity_val_steps as a list in Trainer itself rather than doing it here, and simple we can do a sum to get total sanity val steps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that means

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest in case of num_sanity_val_steps == -1 it should be affected by limit_val_batches too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohitgr7 I like your suggestions. It is true, the trainer should compute these properties and the progress bars should only read them (and maybe sum them).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I open another PR or keep this PR going? Should we use the same num_sanity_val_steps to save these values? (#2891 (comment))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am already working on it :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

@mergify mergify bot requested a review from a team August 11, 2020 15:38
@Borda Borda added this to the 0.9.0 milestone Aug 11, 2020
@mergify
Copy link
Contributor

mergify bot commented Aug 11, 2020

This pull request is now in conflict... :(

@ananyahjha93 ananyahjha93 force-pushed the fix_sanity_check_progress_bar_total branch from 7f8751a to ed25a6b Compare August 13, 2020 21:16
pytorch_lightning/callbacks/progress.py Outdated Show resolved Hide resolved
@mergify
Copy link
Contributor

mergify bot commented Aug 13, 2020

Great job! =)

1 similar comment
@mergify
Copy link
Contributor

mergify bot commented Aug 13, 2020

Great job! =)

The original progress bar will always show
trainer.num_sanity_val_steps even if the length of the validation
DataLoader is less than trainer.num_sanity_val_steps.

The pytorch_lightning.trainer.data_loading._has_len is changed to
a public function has_len, which is called by
pytorch_lightning/callbacks/progress.py
@ananyahjha93 ananyahjha93 force-pushed the fix_sanity_check_progress_bar_total branch from e691bcc to 117c1fe Compare August 13, 2020 23:56
@mergify
Copy link
Contributor

mergify bot commented Aug 14, 2020

This pull request is now in conflict... :(

@justusschock justusschock self-requested a review August 14, 2020 06:44
@@ -293,7 +294,9 @@ def init_test_tqdm(self) -> tqdm:
def on_sanity_check_start(self, trainer, pl_module):
super().on_sanity_check_start(trainer, pl_module)
self.val_progress_bar = self.init_sanity_tqdm()
self.val_progress_bar.total = convert_inf(trainer.num_sanity_val_steps * len(trainer.val_dataloaders))
self.val_progress_bar.total = sum(
min(trainer.num_sanity_val_steps, len(d) if has_len(d) else float('inf')) for d in trainer.val_dataloaders
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a quite common case, can't we add a function for this like

def len_or_default(to_be_checked: Any, default_length: int = int('inf')):
    if has_len(to_be_checked):
        return len(to_be_checked)
    return default_length

This may be an overhead now, but we really need similar things quite often

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is a repeated code here. This is already done in reset_val_dataloader. All we need is just to sum num_sanity_val_steps here once #2917 is fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with both of you. should we block this PR with 2917 or the other way around? Does it matter which one goes first?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest block this one. Once I get some answers there I asked, I'll fix that one tonight and then we can complete this one :)

@mergify
Copy link
Contributor

mergify bot commented Aug 14, 2020

Great job! =)

@rohitgr7 rohitgr7 changed the title Fix the progress bar for the sanity check [WIP] [Blocked by 2917] Fix the progress bar for the sanity check Aug 14, 2020
@mergify
Copy link
Contributor

mergify bot commented Aug 16, 2020

This pull request is now in conflict... :(

@mergify
Copy link
Contributor

mergify bot commented Aug 21, 2020

This pull request is now in conflict... :(

@rohitgr7 rohitgr7 changed the title [WIP] [Blocked by 2917] Fix the progress bar for the sanity check [WIP] Fix the progress bar for the sanity check Aug 21, 2020
@rohitgr7
Copy link
Contributor

@manipopopo I think now we can finish this :)

@manipopopo
Copy link
Contributor Author

Hi @rohitgr7 , it seems that #2917 has fixed the issue. Should we close this PR?

@ananyahjha93
Copy link
Contributor

@manipopopo closing this then
@rohitgr7 just make sure if #2891 is really fixed.

rohitgr7 added a commit that referenced this pull request Aug 26, 2020
@rohitgr7 rohitgr7 mentioned this pull request Aug 26, 2020
7 tasks
ananyahjha93 pushed a commit that referenced this pull request Aug 27, 2020
* Follow up of #2892

* typo

* iterabledataset
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The total number of batches shows by the progress bar of the sanity check is wrong
9 participants