Callback Metric Dict getting overwritten by Log and Progress Bar Dict #1800

olineumann · 2020-05-12T14:30:24Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?
If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Dict values passed to progress bar or log overwriting callback values. See example in issue.

There are several options to solve it. This simply removes adding progress bar and log values to callback dict. Tests passed on my machine.

But this will affect users code e.g. when log metric as early stopping metric was used

PR review

Opinions, other solutions, recommendations, ... welcome! Also help in updating docs.

Did you have fun?

🙃

mergify · 2020-05-12T14:33:04Z

This pull request is now in conflict... :(

awaelchli · 2020-05-12T18:41:07Z

@olineumann Thanks for the PR. Is it correct that the bug only exists for training_epoch end, not for the valid/test_epoch_end? In this case, could you check that your change brings it in line with validation_step/epoch_end.

But this will affect users code e.g. when log metric as early stopping metric was used

I think consensus is that we want to do early stopping only on validation metrics, and no more on training metrics as it is right now. #1458 is dealing with this.

olineumann · 2020-05-12T19:30:43Z

@awaelchli No it should effect train, validation and test epoch end. See my changes on validation epoch end of the base model in the tests.

Borda · 2020-05-12T21:10:41Z

tests/base/model_valid_epoch_ends.py

@@ -43,5 +43,7 @@ def _mean(res, key):
            val_acc_mean /= len(outputs)

        metrics_dict = {'val_loss': val_loss_mean.item(), 'val_acc': val_acc_mean.item()}
-        results = {'progress_bar': metrics_dict, 'log': metrics_dict}
-        return results
+        result = metrics_dict.copy()


why the copy here?

Without copying the metric dict, result and metric dict reference the same object. So adding metric dict to result['progress_bar'] would also change metric_dict. If then adding metric dict to result['log'], result['log']['progress_bar'] would exist and cause errors in tests in my machine.

First I reused the metric_dict by

metric_dict['progress_bar'] = metric_dict metric_dict['log'] = metric_dict return metric_dict

But this is wrong and leads to the same error.

williamFalcon · 2020-05-17T13:03:44Z

pytorch_lightning/trainer/logging.py

@@ -168,10 +168,6 @@ def process_output(self, output, train=False):
        # ---------------
        hiddens = output.get('hiddens')

-        # use every metric passed in as a candidate for callback
-        callback_metrics.update(progress_bar_metrics)


why do we need to remove this?
without this log metrics and progress bar metrics won't be candidates for the callbacks

In #1727 @kessido had the issue, that progress bar or log metric overwrites the callback metric of the top layer dict. An example was also given by @kessido see COLAB

I don't know if this needs to be fixed, that's why I asked in the issue for more opinions. Only @awaelchli responded and said he thinks that this also needs to be fixed.

Because no one started a PR I did to initiate a discussion. I have several ideas on how this could be fixed and mentioned some in the issue above. But this was the easiest and quickest solution. I didn't want to spend too much afford on a solution which then will be discarded.

@williamFalcon @olineumann, in the current update, when that line is removed and we use Result Obj, we cannot save the model checkpoint in form of {val_loss}, it will result epoch=1-val_loss=0 which cannot get the val_loss due to the filename params based on the callback_metrics. Is there another way to assign Result/TrainResul/EvalResult Obj with callback_metrics.

mergify · 2020-05-17T13:15:37Z

This pull request is now in conflict... :(

codecov · 2020-05-26T18:34:05Z

Codecov Report

Merging #1800 into master will decrease coverage by 3%.
The diff coverage is 100%.

@@           Coverage Diff            @@
##           master   #1800     +/-   ##
========================================
- Coverage      89%     86%     -3%     
========================================
  Files          79      78      -1     
  Lines        7302    4919   -2383     
========================================
- Hits         6514    4231   -2283     
+ Misses        788     688    -100

mergify · 2020-05-28T14:41:40Z

This pull request is now in conflict... :(

Borda · 2020-06-11T09:00:33Z

@olineumann mind check last comments? it would be great to get this done 🐰

olineumann · 2020-06-11T09:25:17Z

@olineumann mind check last comments? it would be great to get this done 🐰

Hey Borda,

thanks for replying.

I responded to the last comments on the code reviews. I still not sure what the best way would be to solve the problem. Because the current fix would affect many users which will lead to many issues I think from users complaining their logging or early stopping won't work anymore.

I could implement that only metric values from progress bar or logging would be written to the top dict if the key didn't exists already. That wouldn't affect so much users I think.

I hoped that there were more opinions on that. I could implement the solution above, rebase and push so it could be merged.

pep8speaks · 2020-06-11T10:40:08Z

Hello @olineumann! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-08-06 12:01:29 UTC

olineumann · 2020-06-11T11:03:27Z

@Borda Just rebased to master, implemented and pushed the solution, and all tests passing 🍻.

Now the logging and progress bar metric values are only written to the top-level callback metric dict if the key didn't exist. Also at first, the logging values are written, and then the progress bar values (so logging metric values have a higher priority if both containing the same key). This shouldn't affect other users' code as long as they didn't use the same key in different metric dicts.

Borda · 2020-06-11T11:11:22Z

just thinking that it may be also solved by #1989 what do you think?

olineumann · 2020-06-11T11:28:35Z

just thinking that it may be also solved by #1989 what do you think?

Didn't saw the PR before. Currently, I have not so much time to follow pytorch_lightning... But looks like a nice new feature!

I think when using the new way by passing a Result() object it is already solved. But currently the PR isn't done yet so the old way will still be used and also, as far I understand, the old way should still be supported. So I think this PR could be merged into master to fix #1727 (which wouldn't be fixed by #1989 as long as the user switches to the new result object).

mergify · 2020-06-18T11:23:54Z

This pull request is now in conflict... :(

mergify · 2020-06-20T11:41:14Z

This pull request is now in conflict... :(

mergify · 2020-06-30T21:02:20Z

This pull request is now in conflict... :(

mergify · 2020-07-03T17:25:19Z

This pull request is now in conflict... :(

mergify · 2020-07-21T19:22:40Z

This pull request is now in conflict... :(

Borda · 2020-08-06T11:58:28Z

@olineumann how is it going? can we finish it soon...

…ecting other users code). Moved CHANGELOG.md entry to unreleased section.

williamFalcon · 2020-08-06T12:02:12Z

this was solved in the structured results refactors

mergify bot requested a review from a team May 12, 2020 14:31

awaelchli added the bug Something isn't working label May 12, 2020

Borda mentioned this pull request May 12, 2020

Progress bar \ log dict items added to outputs in training_epoch_end #1727

Closed

Borda reviewed May 12, 2020

View reviewed changes

mergify bot requested a review from a team May 12, 2020 21:12

williamFalcon reviewed May 17, 2020

View reviewed changes

mergify bot requested a review from a team May 17, 2020 13:04

Borda force-pushed the issue/callback_metric_overwritten branch from 4e24924 to dde55a8 Compare May 26, 2020 17:32

Borda added the waiting on author Waiting on user action, correction, or update label Jun 8, 2020

olineumann force-pushed the issue/callback_metric_overwritten branch from dde55a8 to 1c6bfaa Compare June 11, 2020 10:40

olineumann added 2 commits August 6, 2020 13:58

Fixed progress bar and log dicts overwriting callback metric values.

30a26b4

Updaed CHANGELOG.md

45013d7

olineumann added 2 commits August 6, 2020 14:01

Implemented updating callback metric only if key not exists (less aff…

4fd1c7d

…ecting other users code). Moved CHANGELOG.md entry to unreleased section.

Updated logging.py to PEP8 styleguide (trailing whitespace in comments)

a1c0da6

Borda force-pushed the issue/callback_metric_overwritten branch from f14fb59 to a1c0da6 Compare August 6, 2020 12:01

williamFalcon closed this Aug 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Callback Metric Dict getting overwritten by Log and Progress Bar Dict #1800

Callback Metric Dict getting overwritten by Log and Progress Bar Dict #1800

olineumann commented May 12, 2020 •

edited

Loading

mergify bot commented May 12, 2020

awaelchli commented May 12, 2020

olineumann commented May 12, 2020 •

edited

Loading

Borda May 12, 2020

olineumann May 12, 2020

williamFalcon May 17, 2020

olineumann May 18, 2020

sykrn Jul 29, 2020

mergify bot commented May 17, 2020

codecov bot commented May 26, 2020 •

edited

Loading

mergify bot commented May 28, 2020

Borda commented Jun 11, 2020

olineumann commented Jun 11, 2020

pep8speaks commented Jun 11, 2020 •

edited

Loading

olineumann commented Jun 11, 2020

Borda commented Jun 11, 2020

olineumann commented Jun 11, 2020 •

edited

Loading

mergify bot commented Jun 18, 2020

mergify bot commented Jun 20, 2020

mergify bot commented Jun 30, 2020

mergify bot commented Jul 3, 2020

mergify bot commented Jul 21, 2020

Borda commented Aug 6, 2020

williamFalcon commented Aug 6, 2020

Callback Metric Dict getting overwritten by Log and Progress Bar Dict #1800

Callback Metric Dict getting overwritten by Log and Progress Bar Dict #1800

Conversation

olineumann commented May 12, 2020 • edited Loading

Before submitting

What does this PR do?

PR review

Did you have fun?

mergify bot commented May 12, 2020

awaelchli commented May 12, 2020

olineumann commented May 12, 2020 • edited Loading

Borda May 12, 2020

Choose a reason for hiding this comment

olineumann May 12, 2020

Choose a reason for hiding this comment

williamFalcon May 17, 2020

Choose a reason for hiding this comment

olineumann May 18, 2020

Choose a reason for hiding this comment

sykrn Jul 29, 2020

Choose a reason for hiding this comment

mergify bot commented May 17, 2020

codecov bot commented May 26, 2020 • edited Loading

Codecov Report

mergify bot commented May 28, 2020

Borda commented Jun 11, 2020

olineumann commented Jun 11, 2020

pep8speaks commented Jun 11, 2020 • edited Loading

Comment last updated at 2020-08-06 12:01:29 UTC

olineumann commented Jun 11, 2020

Borda commented Jun 11, 2020

olineumann commented Jun 11, 2020 • edited Loading

mergify bot commented Jun 18, 2020

mergify bot commented Jun 20, 2020

mergify bot commented Jun 30, 2020

mergify bot commented Jul 3, 2020

mergify bot commented Jul 21, 2020

Borda commented Aug 6, 2020

williamFalcon commented Aug 6, 2020

olineumann commented May 12, 2020 •

edited

Loading

olineumann commented May 12, 2020 •

edited

Loading

codecov bot commented May 26, 2020 •

edited

Loading

pep8speaks commented Jun 11, 2020 •

edited

Loading

olineumann commented Jun 11, 2020 •

edited

Loading