Optimizer Frequencies logic, and new configure_optimizers #1269

asafmanor · 2020-03-28T09:11:12Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?
If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Fixes #594

Description

This PR implements "optimizer_frequencies" logic, where every optimizer can be used several steps in a row as the number of its associated frequency before the next optimizer is called.
This is a highly needed feature for the optimization of networks, e.g. Wasserstein GANs.
The API follows @williamFalcon suggested API, and allows LightningModule.configure_optimizers()
to return a dictionary or multiple dictionaries, containing the new frequency key.
Backward compatibility is obviously maintained.

Documentation:

The LightningModule.configure_optimizers() method.
The ValueError returned from Trainer.init_optimizers() method.

Tests:

Tests were added in tests.models.test_gpu to assert return types of init_optimizers.

TODO:

A test is required to assert that optimizers are called in the right order and frequency.
I've started implementing such a test in test_optimizers but ran into questions regarding the method.
On a single_gpu training in a private project of mine, this has proven to work flawlessly.
I have yet to test the case where the batch is split into multiple splits.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Yep 🥇

and returns optimizer_frequencies. optimizer_frequencies was added as a member of Trainer.

Description added to configure_optimizers in LightningModule

pep8speaks · 2020-03-28T09:11:17Z

Hello @asafmanor! Thanks for updating this PR.

In the file pytorch_lightning/trainer/trainer.py:

Line 769:111: E501 line too long (112 > 110 characters)
Line 786:111: E501 line too long (113 > 110 characters)
Line 788:111: E501 line too long (115 > 110 characters)

Comment last updated at 2020-03-31 13:40:10 UTC

codecov · 2020-03-28T09:39:21Z

Codecov Report

Merging #1269 into master will decrease coverage by 0%.
The diff coverage is 80%.

@@          Coverage Diff           @@
##           master   #1269   +/-   ##
======================================
- Coverage      92%     92%   -0%     
======================================
  Files          62      62           
  Lines        3210    3235   +25     
======================================
+ Hits         2949    2968   +19     
- Misses        261     267    +6

pytorch_lightning/core/lightning.py

Borda · 2020-03-28T22:51:21Z

@PyTorchLightning/core-contributors ^^ pls check...

pytorch_lightning/trainer/distrib_data_parallel.py

pytorch_lightning/trainer/trainer.py

pytorch_lightning/trainer/distrib_parts.py

tests/base/utils.py

Borda

pls, could you check my questions?

pytorch_lightning/trainer/trainer.py

pytorch_lightning/trainer/training_loop.py

tests/models/test_gpu.py

pytorch_lightning/trainer/training_loop.py

Borda · 2020-03-29T12:35:55Z

just now it turned out there is another PR refactoring optimizers, see #1279

Borda · 2020-03-29T12:38:25Z

@ethanwharris pls be aware also of this work and decide which shall be merged first to reduce conflicts... if you agree we can change the destination branch, so this can be merged to yours #1279 or that one to here...

ethanwharris

This looks great :)

@Borda I'm happy for this to be merged first and then I can rebase my changes from #1279 on top

Co-Authored-By: Asaf Manor <32155911+asafmanor@users.noreply.github.com>

jeremyjordan

nice work on this. i like the added option of returning dicts, makes the code more explicit when reading 👍

tests/models/test_gpu.py

Borda

LGTM 🚀

pytorch_lightning/trainer/trainer.py

pytorch_lightning/trainer/training_loop.py

Borda · 2020-03-31T13:11:14Z

@asafmanor GREAT work, may you pls just add a note to changelog so it can go...
when you are done remove "wipe" from PR name, Thx

mergify · 2020-03-31T13:26:31Z

This pull request is now in conflict... :(

CHANGELOG.md

mergify · 2020-03-31T13:27:31Z

This pull request is now in conflict... :(

CHANGELOG.md

mergify · 2020-03-31T16:41:27Z

Great job! =)

williamFalcon · 2020-03-31T16:46:54Z

pytorch_lightning/core/lightning.py

+                    n_critic = 5
+                    return (
+                        {'optimizer': dis_opt, 'frequency': n_critic},
+                        {'optimizer': gen_opt, 'frequency': 1}


shouldn't this also have an example for scheduler?

{'optimizer': dis_opt, 'frequency': n_critic, 'lr_scheduler': Scheduler()}

@Borda @asafmanor?
Also, amazing job :)

williamFalcon · 2020-03-31T16:48:42Z

@asafmanor what about saving/loading checkpoints? did we handle storing this information for resuming training?

asafmanor · 2020-03-31T17:05:57Z

I use hparams to save the frequencies.
I can easily add the optimizer frequencies to the checkpoint.
Speaking of that, I've noticed that load_from_checkpoint() does not resume optimizer states -
Is there a different method for that? should I implement one for resuming the entire information stored in the checkpoint?

Borda · 2020-03-31T17:31:57Z

We can do followup PR if needed but now it was blocking others so we needed to get it done...

…AI#1269) * init_optimizers accepts Dict, Sequence[Dict] and returns optimizer_frequencies. optimizer_frequencies was added as a member of Trainer. * Optimizer frequencies logic implemented in training_loop. Description added to configure_optimizers in LightningModule * optimizer frequencies tests added to test_gpu * Fixed formatting for merging PR Lightning-AI#1269 * Apply suggestions from code review * Apply suggestions from code review Co-Authored-By: Asaf Manor <32155911+asafmanor@users.noreply.github.com> * Update trainer.py * Moving get_optimizers_iterable() outside. * Update note * Apply suggestions from code review * formatting * formatting * Update CHANGELOG.md * formatting * Update CHANGELOG.md Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

asafmanor added 3 commits March 28, 2020 11:24

init_optimizers accepts Dict, Sequence[Dict]

717d592

and returns optimizer_frequencies. optimizer_frequencies was added as a member of Trainer.

Optimizer frequencies logic implemented in training_loop.

3340b46

Description added to configure_optimizers in LightningModule

optimizer frequencies tests added to test_gpu

cb1a9e1

Fixed formatting for merging PR #1269

3c1d037

asafmanor commented Mar 28, 2020

View reviewed changes

pytorch_lightning/core/lightning.py Outdated Show resolved Hide resolved

Borda requested review from a team March 28, 2020 22:50

Borda added the feature Is an improvement or enhancement label Mar 28, 2020

Borda added this to the 0.7.2 milestone Mar 28, 2020

Borda reviewed Mar 28, 2020

View reviewed changes

pytorch_lightning/trainer/distrib_data_parallel.py Outdated Show resolved Hide resolved

Borda reviewed Mar 28, 2020

View reviewed changes

Apply suggestions from code review

9433a5c

Borda requested changes Mar 28, 2020

View reviewed changes

asafmanor requested a review from Borda March 29, 2020 09:15

asafmanor commented Mar 29, 2020

View reviewed changes

pytorch_lightning/trainer/training_loop.py Outdated Show resolved Hide resolved

ethanwharris mentioned this pull request Mar 29, 2020

Remove default optimizer, add None optimizer option #1279

Merged

5 tasks

Borda requested review from ethanwharris and a team March 29, 2020 12:36

ethanwharris approved these changes Mar 29, 2020

View reviewed changes

Apply suggestions from code review

0d1dab5

Co-Authored-By: Asaf Manor <32155911+asafmanor@users.noreply.github.com>

Borda changed the title ~~Optimizer Frequencies logic, and new configure_optimizers() API.~~ Optimizer Frequencies logic, and new configure_optimizers Mar 29, 2020

Borda and others added 2 commits March 29, 2020 15:18

Update trainer.py

c8251db

Moving get_optimizers_iterable() outside.

dceb0df

mergify bot requested a review from a team March 30, 2020 22:33

jeremyjordan approved these changes Mar 31, 2020

View reviewed changes