Structured results (train loop only. val loop separate PR) (PR 2/5) #2615

williamFalcon · 2020-07-15T11:56:58Z

Adds support for cleaner returns between steps.

Expressive syntax

Before

def training_step(...):
    return {'loss', ..., 'log': {'a': 1, 'b': 1}, 'progress_bar': {'a': 1}}

Now:

def training_step():

    result = TrainResult(minimize=some_value)
    result.log('a', 1, prog_bar=True)
    result.log('b', 1)

    return result

No need for multiple reduction steps

before

def training_step():
    return {'a': 1}

def training_epoch_end(self, outputs):
    avg = torch.tensor([x['a'] for x in outputs]).mean()
    return {'val_loss': avg}

Now

def training_step():
    result = TrainResult(early_stop_on=loss)
    result.log('a', 1, on_epoch=True, reduce_fx=torch.mean)

    return result

pep8speaks · 2020-07-15T11:57:02Z

Hello @williamFalcon! Thanks for updating this PR.

In the file pytorch_lightning/core/step_result.py:

Line 114:9: E121 continuation line under-indented for hanging indent
Line 114:9: E125 continuation line with same indent as next logical line

In the file tests/trainer/test_trainer.py:

Line 592:120: E501 line too long (138 > 119 characters)

Comment last updated at 2020-07-20 22:42:50 UTC

codecov · 2020-07-15T12:14:43Z

Codecov Report

Merging #2615 into master will decrease coverage by 0%.
The diff coverage is 85%.

@@           Coverage Diff           @@
##           master   #2615    +/-   ##
=======================================
- Coverage      91%     91%    -0%     
=======================================
  Files          70      71     +1     
  Lines        5778    5918   +140     
=======================================
+ Hits         5270    5388   +118     
- Misses        508     530    +22

jeremyjordan · 2020-07-17T03:30:27Z

Personally, I would prefer that the early_stop_on and checkpoint_on are configured via the callback initializations. That way, the callback fully encapsulates all of the concerns for the behavior.

williamFalcon · 2020-07-17T09:51:21Z

the advantage of doing it this way is that you have finegrain control over what to early step or checkpoint on (not all this magic keyword val loss that confuses everyone). 2) it allows new behavior like changing it dynamically during training which has been requested as well , 3) it enables these callbacks to be used in training OR eval loop instead of just eval.

whereas with the current callbacks you are stuck with what you put in at the start, and to modify what to early step or checkpoint on you have to init the callbacks, modify the keyword and remember to have those magic words in the return dict.

justusschock · 2020-07-17T11:54:16Z

@williamFalcon I see what you mean, but I'd still go with the callback. I think, we should just pass the structured result to the callback and then do it as we currently do.

I think it's really important to allow this option, since this way it is much easier to change behaviour

williamFalcon · 2020-07-17T12:12:53Z

@justusschock that’s what i was thinking. the result would feed directly to the callback, so that you still maintain control via the callback.

but just to clarify, you like yhe option of specifying the particular value to condition on in the result?

i guess i may have not explained correctly. The callbacks are still there and should be used. But the way you condition what to use for that particular callback is specified in the structured result...
So, it’s as if the callback didn’t have a “monitor” arg but instead you did it through the result. The reason is that otherwise it’s a a pain to decide what to use for early stopping or checkpointing, and you have no flexibility. Also everyone is confused by “val_loss” and think it’s a reserved word or something

justusschock · 2020-07-17T14:48:09Z

Then I propose the following: Let the user use both ways. Default the argument here to None. If the argument here is None, you use the one provided by the callback and else you use the one here.

I really want to specify what I monitor from my callback, since I want it to be decoupled from my model entirely.

williamFalcon · 2020-07-17T14:54:40Z

perfect. i like that a lot. i’ll make them none then. in my personal research, each model uses different things to checkpoint or callback, so i personally prefer to couple it haha. but this approach enables both :)

awaelchli

just some typos

pytorch_lightning/core/hooks.py

pytorch_lightning/core/step_result.py

pytorch_lightning/overrides/data_parallel.py

pytorch_lightning/trainer/callback_hook.py

…tning into st

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

sooheon · 2020-08-05T04:54:52Z

To be clear, the checkpoint_on, early_stop_on will feed the values directly to the appropriate callbacks, so if it is set the user can completely ignore the monitor arg in these callbacks?

williamFalcon · 2020-08-05T10:31:24Z

exactly!
actually mind adding a PR that explains this in the docs?

sooheon · 2020-08-05T11:51:14Z

Sure. Can you explain the function of the minimize arg as well before I do so?

williamFalcon added 2 commits July 14, 2020 21:03

r

4d2b081

r

4513eb3

mergify bot requested a review from a team July 15, 2020 11:57

williamFalcon added 13 commits July 15, 2020 12:26

r

5cc01ff

patched optimizer closure with sr

c747f80

patched optimizer closure with sr

ed9b4f8

patched optimizer closure with sr

3f98d18

added train step structured result

8352a56

added train step structured result

7d453d4

added train step structured result

23403ce

added train step structured result

9bc77ac

added train step structured result

9cdaf8f

added train step structured result

9309b9e

added train step structured result

8241130

added train step structured result

ceeedc2

added train step structured result

6bbe6d8

awaelchli mentioned this pull request Jul 16, 2020

Why I have to write .epoch_end and all the *_step #2619

Closed

added train step structured result

c56ea84

williamFalcon mentioned this pull request Jul 17, 2020

[WIP] Add structured result output #1989

Closed

williamFalcon added 3 commits July 18, 2020 10:53

added train step structured result

9df0e16

added train step structured result

331fe55

added train step structured result

0c8afc0

awaelchli reviewed Jul 20, 2020

View reviewed changes

mergify bot requested a review from a team July 20, 2020 21:00

awaelchli and others added 21 commits July 20, 2020 23:01

docstring typos

74cd049

finished tests for structured results on train epoch

fe91a2b

Merge branch 'st' of https://github.com/PyTorchLightning/pytorch-ligh…

42b3724

…tning into st

finished tests for structured results on train epoch

7dfda42

finished tests for structured results on train epoch

4f48912

finished tests for structured results on train epoch

de5cbb9

finished tests for structured results on train epoch

6ccf0cc

finished tests for structured results on train epoch

e671a79

finished tests for structured results on train epoch

f3ee6c2

finished tests for structured results on train epoch

be99f0a

finished tests for structured results on train epoch

d767547

finished tests for structured results on train epoch

de16f8a

finished tests for structured results on train epoch

072cb09

finished tests for structured results on train epoch

ea26761

finished tests for structured results on train epoch

f74e3b0

finished tests for structured results on train epoch

cab63d4

finished tests for structured results on train epoch

db26566

finished tests for structured results on train epoch

a1010dd

finished tests for structured results on train epoch

30e17aa

Update pytorch_lightning/core/step_result.py

a7f0544

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Update pytorch_lightning/overrides/data_parallel.py

704d201

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

williamFalcon changed the title ~~Structured results (train loop only. val loop separate PR) (PR 1/4)~~ Structured results (train loop only. val loop separate PR) (PR 2/5) Jul 20, 2020

williamFalcon merged commit 6d10ac2 into master Jul 20, 2020

williamFalcon deleted the st branch July 22, 2020 17:54

awaelchli mentioned this pull request Jul 24, 2020

"test_loss" in test step #1936

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Structured results (train loop only. val loop separate PR) (PR 2/5) #2615

Structured results (train loop only. val loop separate PR) (PR 2/5) #2615

williamFalcon commented Jul 15, 2020 •

edited

Loading

pep8speaks commented Jul 15, 2020 •

edited

Loading

codecov bot commented Jul 15, 2020 •

edited

Loading

jeremyjordan commented Jul 17, 2020

williamFalcon commented Jul 17, 2020 •

edited

Loading

justusschock commented Jul 17, 2020 •

edited

Loading

williamFalcon commented Jul 17, 2020

justusschock commented Jul 17, 2020

williamFalcon commented Jul 17, 2020

awaelchli left a comment

sooheon commented Aug 5, 2020

williamFalcon commented Aug 5, 2020

sooheon commented Aug 5, 2020

Structured results (train loop only. val loop separate PR) (PR 2/5) #2615

Structured results (train loop only. val loop separate PR) (PR 2/5) #2615

Conversation

williamFalcon commented Jul 15, 2020 • edited Loading

Expressive syntax

No need for multiple reduction steps

pep8speaks commented Jul 15, 2020 • edited Loading

Comment last updated at 2020-07-20 22:42:50 UTC

codecov bot commented Jul 15, 2020 • edited Loading

Codecov Report

jeremyjordan commented Jul 17, 2020

williamFalcon commented Jul 17, 2020 • edited Loading

justusschock commented Jul 17, 2020 • edited Loading

williamFalcon commented Jul 17, 2020

justusschock commented Jul 17, 2020

williamFalcon commented Jul 17, 2020

awaelchli left a comment

Choose a reason for hiding this comment

sooheon commented Aug 5, 2020

williamFalcon commented Aug 5, 2020

sooheon commented Aug 5, 2020

williamFalcon commented Jul 15, 2020 •

edited

Loading

pep8speaks commented Jul 15, 2020 •

edited

Loading

codecov bot commented Jul 15, 2020 •

edited

Loading

williamFalcon commented Jul 17, 2020 •

edited

Loading

justusschock commented Jul 17, 2020 •

edited

Loading