fix Misstep #2478

ameliatqy · 2020-07-03T00:19:42Z

What does this PR do?

Fixes #2455. This used to be #2475 but I messed up on rebasing and to be safe, created a new pull request. This is a Draft PR.

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Yes, I am new to contributing to open-source and actually find it quite fun. May be time to consider a new hobby...

awaelchli · 2020-07-03T06:13:46Z

pytorch_lightning/trainer/training_loop.py

@@ -504,6 +505,9 @@ def run_training_epoch(self):
        # epoch end hook
        self.run_on_epoch_end_hook(model)

+        # increate global step by one to progress to the next epoch
+        self.global_step += 1


I think you need to do self.increment_accumulated_grad_global_step() here, otherwise the total_train_batch_idx does not get updated. I think that's why the tests are failing.

I made the change and it works as expected. So I have changed it.

However, the tests are still failing. Actually, I think I am confused about the test. I ran the local test on my pull request and got this total summary result at the end:

I also ran the test over the original Pytorch Lightning master branch and also got this total summary result at the end:

Am I making a mistake somewhere? Should I have based my branch off on something other than the master branch?

awaelchli · 2020-07-03T06:16:36Z

pytorch_lightning/loggers/base.py

+            # check if dictionary keys are unique
+            agg_keys = set([key for met in self._metrics_to_agg for key in met.keys()])
+            num_keys = sum([len(met) for met in self._metrics_to_agg])
+
+            # exclude 'epoch' because it is a metric automatically added in by log_metrics and will count as a
+            # duplicate. If you want to get rid of this, I would suggest you should get rid of `scalar_metrics[


are the changes here really related to the global step issue?
If possible could we make a separate PR for this? we only want to fix one thing at a time.

I can see it being a separate issue. But, this piece of code is necessary for the code to run smoothly. Otherwise, when we fix the step issue (i.e. combining the last training batch + training_epoch_end + validation_epoch_end metrics in the same step), there will be an error combining the three pieces of data together because the original merge_dict() in _reduce_agg_metrics() requires that all the metric dictionaries have the same keys - which for these three types of data is usually not the case.

I don't mind doing a separate pull request for this though - but it will probably result in most tests failing for this pull request, which we will probably have to make sure to fix in the new pull request.

mergify · 2020-07-20T23:02:41Z

This pull request is now in conflict... :(

Borda · 2020-08-11T10:26:29Z

@ameliatqy I have rebased it to master, can we finish it in this release?

pep8speaks · 2020-08-11T17:32:04Z

Hello @ameliatqy! Thanks for updating this PR.

In the file pytorch_lightning/trainer/training_loop.py:

Line 551:13: E303 too many blank lines (2)

Comment last updated at 2020-08-18 02:14:57 UTC

codecov · 2020-08-11T21:23:41Z

Codecov Report

Merging #2478 into master will decrease coverage by 0%.
The diff coverage is 79%.

@@          Coverage Diff           @@
##           master   #2478   +/-   ##
======================================
- Coverage      90%     90%   -0%     
======================================
  Files          81      81           
  Lines        7672    7689   +17     
======================================
+ Hits         6906    6918   +12     
- Misses        766     771    +5

ameliatqy · 2020-08-11T21:51:40Z

@ameliatqy I have rebased it to master, can we finish it in this release?

I have worked on the pull request and the checks have all passed. How do I get it approved? I see that I need three approving reviews - am I responsible for finding people to review it or will the reviewers be assigned by someone else?

awaelchli · 2020-08-16T12:56:00Z

@ameliatqy I have rebased it to master, can we finish it in this release?

I have worked on the pull request and the checks have all passed. How do I get it approved? I see that I need three approving reviews - am I responsible for finding people to review it or will the reviewers be assigned by someone else?

Sorry for this late response.
Anyone in the community can review, but only the reviews of the core team count towards a merge decision.

I have two questions:

Does the issue with global step still exist on master?
Why do we need a decrement in the global_step? I would prefer a solution where this is not needed. Also, there was recently a fix Fix accumulate_grad_batches for last batch #2853 in accumulate batches for the last batch, so I have the feeling this relates here.

mergify · 2020-08-16T15:39:24Z

This pull request is now in conflict... :(

…bal_step()

williamFalcon · 2020-08-18T00:55:13Z

please rebase master since this code is super old haha

ameliatqy · 2020-08-18T01:39:00Z

please rebase master since this code is super old haha

Interesting, I thought I rebased my branch today XD I just rebased it again - does it look right? Rebasing is obviously not my strong point XD

mergify · 2020-08-20T00:37:20Z

This pull request is now in conflict... :(

Borda · 2020-09-18T21:14:55Z

@ameliatqy how about this one, mind rebase and resolve conflicts? 🐰

ameliatqy · 2020-09-23T21:21:39Z

@ameliatqy how about this one, mind rebase and resolve conflicts? 🐰

Sure, I'd be happy to. But to fully resolve this pull request, could someone answer my question in #2478 (comment) ? I can't fully finish up the pull request without an answer to this question.

EspenHa · 2020-10-01T08:54:09Z

pytorch_lightning/trainer/logging.py

@@ -73,8 +74,6 @@ def log_metrics(self, metrics, grad_norm_dic, step=None):
        # log actual metrics
        if self.is_global_zero and self.logger is not None:
            self.logger.agg_and_log_metrics(scalar_metrics, step=step)
-            self.logger.save()


How is this related to #3398 (comment)?
@SkafteNicki Maybe you know something about this also?

awaelchli · 2020-10-04T17:09:41Z

This is now fixed on master. There is no offset anymore and training epoch end logs at the correct step.

ameliatqy mentioned this pull request Jul 3, 2020

Debug misstep #2475

Closed

7 tasks

mergify bot requested a review from a team July 3, 2020 00:20

ameliatqy mentioned this pull request Jul 3, 2020

training_epoch_end log output gets combined with next epoch training #2455

Closed

awaelchli reviewed Jul 3, 2020

View reviewed changes

mergify bot requested a review from a team July 3, 2020 06:14

awaelchli reviewed Jul 3, 2020

View reviewed changes

mergify bot requested a review from a team July 3, 2020 06:17

Borda added the bug Something isn't working label Jul 3, 2020

ameliatqy marked this pull request as ready for review July 9, 2020 23:46

Borda added this to the 0.9.0 milestone Aug 6, 2020

Borda changed the title ~~[Bugfix] Misstep~~ fix Misstep Aug 11, 2020

Borda force-pushed the bugfix_misstep branch from 2247c59 to dda6c8a Compare August 11, 2020 10:25

Borda requested a review from awaelchli August 11, 2020 23:07

atee added 10 commits August 17, 2020 10:15

Trying to pass PyTorch tests (changing agg_keys from list to set)

9ef0068

Trying to pass PyTorch tests (changing agg_keys from list to set)

24f70ec

Trying to pass PyTorch tests (changing agg_keys from list to set)

3802f4d

Trying to pass PyTorch tests (changing agg_keys from list to set)

d8a21f1

Trying to pass PyTorch tests (changing agg_keys from list to set)

58e6a02

Trying to pass PyTorch tests (changing agg_keys from list to set)

e7ff668

Trying to pass PyTorch tests (changing agg_keys from list to set)

04f3691

Trying to pass PyTorch tests (changing agg_keys from list to set)

62d7352

Changing self.global_step += 1 to self.increment_accumulated_grad_glo…

24d67c2

…bal_step()

Changing self.global_step += 1 to self.increment_accumulated_grad_glo…

749d658

…bal_step()

atee added 2 commits August 17, 2020 17:42

Getting rid decrement_global_step function

514e280

Getting rid decrement_global_step function

0d34ee2

atee and others added 12 commits August 17, 2020 17:55

Getting rid decrement_global_step function

d80414e

Getting rid decrement_global_step function

92698c9

Getting rid decrement_global_step function

0d9c0c0

Getting rid decrement_global_step function

ccb7ce4

Getting rid decrement_global_step function

04d76de

Getting rid decrement_global_step function

cf0684e

Getting rid decrement_global_step function

81b20e2

Getting rid decrement_global_step function

df47cd6

Getting rid decrement_global_step function

e9712d5

Getting rid decrement_global_step function

8e18e50

Getting rid decrement_global_step function

78e96a2

Update __init__.py

6c5a78d

atee added 2 commits August 17, 2020 18:51

Getting rid decrement_global_step function

ced6033

Getting rid decrement_global_step function

7aba1ed

edenlightning modified the milestones: 0.9.0, 0.9.x Aug 20, 2020

EspenHa reviewed Oct 1, 2020

View reviewed changes

edenlightning modified the milestones: 0.9.x, 1.0 Oct 4, 2020

awaelchli closed this Oct 4, 2020

Borda modified the milestones: 1.0, 0.10.0 Oct 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix Misstep #2478

fix Misstep #2478

ameliatqy commented Jul 3, 2020 •

edited by awaelchli

Loading

awaelchli Jul 3, 2020 •

edited

Loading

ameliatqy Jul 3, 2020

awaelchli Jul 3, 2020

ameliatqy Jul 3, 2020

mergify bot commented Jul 20, 2020

Borda commented Aug 11, 2020

pep8speaks commented Aug 11, 2020 •

edited

Loading

codecov bot commented Aug 11, 2020 •

edited

Loading

ameliatqy commented Aug 11, 2020

awaelchli commented Aug 16, 2020 •

edited

Loading

mergify bot commented Aug 16, 2020

williamFalcon commented Aug 18, 2020

ameliatqy commented Aug 18, 2020

mergify bot commented Aug 20, 2020

Borda commented Sep 18, 2020

ameliatqy commented Sep 23, 2020 •

edited

Loading

EspenHa Oct 1, 2020

awaelchli commented Oct 4, 2020

fix Misstep #2478

fix Misstep #2478

Conversation

ameliatqy commented Jul 3, 2020 • edited by awaelchli Loading

What does this PR do?

Before submitting

PR review

Did you have fun?

awaelchli Jul 3, 2020 • edited Loading

Choose a reason for hiding this comment

ameliatqy Jul 3, 2020

Choose a reason for hiding this comment

awaelchli Jul 3, 2020

Choose a reason for hiding this comment

ameliatqy Jul 3, 2020

Choose a reason for hiding this comment

mergify bot commented Jul 20, 2020

Borda commented Aug 11, 2020

pep8speaks commented Aug 11, 2020 • edited Loading

Comment last updated at 2020-08-18 02:14:57 UTC

codecov bot commented Aug 11, 2020 • edited Loading

Codecov Report

ameliatqy commented Aug 11, 2020

awaelchli commented Aug 16, 2020 • edited Loading

mergify bot commented Aug 16, 2020

williamFalcon commented Aug 18, 2020

ameliatqy commented Aug 18, 2020

mergify bot commented Aug 20, 2020

Borda commented Sep 18, 2020

ameliatqy commented Sep 23, 2020 • edited Loading

EspenHa Oct 1, 2020

Choose a reason for hiding this comment

awaelchli commented Oct 4, 2020

ameliatqy commented Jul 3, 2020 •

edited by awaelchli

Loading

awaelchli Jul 3, 2020 •

edited

Loading

pep8speaks commented Aug 11, 2020 •

edited

Loading

codecov bot commented Aug 11, 2020 •

edited

Loading

awaelchli commented Aug 16, 2020 •

edited

Loading

ameliatqy commented Sep 23, 2020 •

edited

Loading