Support for Stoch Wt Avg (SWA) closes #321 #320

pchalasani · 2022-11-27T01:30:29Z

Stochastic Weight Averaging (SWA) is (quoting/paraphrasing from their page):

a simple procedure that improves generalization in deep learning over Stochastic Gradient Descent (SGD) at no additional cost, and can be used as a drop-in replacement for any other optimizer in PyTorch. SWA has a wide range of applications and features, [...] including [...] improve the stability of training as well as the final average rewards of policy-gradient methods in deep reinforcement learning.

See the PyTorch SWA page for more.

Description

Relatively simple change in exp_manager.py. It allows an additional key "swa" to be included in policy_kwargs, e.g.

hyperparams["policy_kwargs"]["swa"] = {
   "swa_start": 5, 
   "swa_freq: 3,
   "swa_lr": 0.05
}

Motivation and Context

SWA might help improve stability and reduce sensitivity to random seeds in some DRL applications.

Closes #321

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist:

I've read the CONTRIBUTION guide (required)
I have updated the changelog accordingly (required).
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.
I have reformatted the code using make format (required)
I have checked the codestyle using make check-codestyle and make lint (required)
I have ensured make pytest and make type both pass. (required)

Note: we are using a maximum length of 127 characters per line

pchalasani · 2022-11-27T17:42:54Z

I realized we need to do opt.swap_swa_sgd() at the end of training, and some further thought is needed, to see how this impacts computation of validation metrics by EvalCallback etc

pchalasani · 2022-11-27T20:22:35Z

I added opt.swap_swa_sgd() after model.learn.
We also need to do this before and after each evaluate_policy() call in EvalCallback (which is in the original sb3 repo), so that validation metrics are evaluated with the SWA-averaged model weights. We could potentially subclass EvalCallback to accomplish this.

Support for Stoch Wt Avg (SWA)

eb9dbb0

pchalasani changed the title ~~Support for Stoch Wt Avg (SWA)~~ Support for Stoch Wt Avg (SWA) closes #321 Nov 27, 2022

pchalasani added 2 commits November 26, 2022 21:28

exp_manager.py fix import order

46898f5

exp_manager.py formatting (black)

ee4753f

pchalasani marked this pull request as ready for review November 27, 2022 02:51

pchalasani marked this pull request as draft November 27, 2022 17:43

opt.swap_swa_sgd() after model.learn

46a16c1

self.model -> model

850f24d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Stoch Wt Avg (SWA) closes #321 #320

Support for Stoch Wt Avg (SWA) closes #321 #320

pchalasani commented Nov 27, 2022 •

edited

Loading

pchalasani commented Nov 27, 2022

pchalasani commented Nov 27, 2022

Support for Stoch Wt Avg (SWA) closes #321 #320

Are you sure you want to change the base?

Support for Stoch Wt Avg (SWA) closes #321 #320

Conversation

pchalasani commented Nov 27, 2022 • edited Loading

Description

Motivation and Context

Types of changes

Checklist:

pchalasani commented Nov 27, 2022

pchalasani commented Nov 27, 2022

pchalasani commented Nov 27, 2022 •

edited

Loading