Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OneCycle LR, LR finder, custom Tensorboard, etc. #70

Merged
merged 25 commits into from
Jul 23, 2021
Merged

Conversation

erwulff
Copy link
Collaborator

@erwulff erwulff commented Jun 29, 2021

This pull request includes several new features.

OneCycleScheduler

The OneCycleScheduler is a tf.keras.optimizers.schedules.LearningRateSchedule that schedules the learning rate on a 1cycle policy as per Leslie Smith's paper (https://arxiv.org/pdf/1803.09820.pdf). The implementation adopts additional improvements as per the fastai library (https://docs.fast.ai/callbacks.one_cycle.html) where only two phases are used and the adaptation is done using cosine annealing.

In my experience the OneCycle policy improves generalization and speeds up learning.

Learning Rate Finder

The learning rate finder implements a technique to easily estimate a range of learning rates that should perform well given the current model setup. When the model architecture or other hyperparameters are changed, the learning rate finder can be run in order to find a new suitable LR range.

The learning rate finder starts training the model at a very low LR, increasing it every batch. The batch loss is plotted vs LR (or, equivalently, training steps) and a figure is created from which a suitable LR range can be determined.

This technique was first introduced by Leslie Smith in
https://arxiv.org/abs/1506.01186.

Simply run

python mlpf/pipeline.py find-lr -c parameters/<config-filename>.yaml

and a plot of loss vs learning rate like the one below will be created. A suitable LR range lies somewhere in the negative slope of the curve, where the loss is steadily decreasing at a relatively high rate.
lr_finder

Pipeline.py

mlpf/pipeline.py is my beginning of a click based alternative to the mlpf/launcher.py. I created it, in part, to not change the mlpf/launcher.py too much in one pull request. If mlpf/pipeline.py is well liked it might be able to replace mlpf/launcher.py sometime in the future. Right now, it is still a work in progress.

Other notes

The learning rate is no longer scaled by the batch size. Instead, the learning rate used is now the same as explicitly defined in the configuration file. When using the exponential decay schedule or the OneCycle schedule, the LR specified in the config will be the maximum LR used in the schedule.

The structure of the training directory has been reorganized. Instead of writing many files directly in the training directory they have been organized in different subfolders:

  • history: contains the history_{}.json, event_{}.pdf and cm_normed.pdf files
  • weights: contains all checkpoints of model weights
  • evaluation: contains the pred.npz file(s)
  • tensorboard_logs: contains the tensorboard logs

erwulff and others added 9 commits June 24, 2021 22:23
This commit also includes
 - Custom tensorboard callback logging learning rate & momentum
 - A utils.py file collecting utilities used in more than one file
 - Clean-up of how output files are organized
 - Configuration files using the OneCycle scheduler
`mlpf/pipeline.py` is the beginning of a `click` based alternative to the
`mlpf/launcher.py`.
Also add option to give a prefix to the name of the training
directory
Also add lr_schedule parameter to configuration files
The previous commit still scaled the LR, this one fixes it.
- create get_train_val_datasets() function to get datasets for training
- move targets_multi_output() from model_setup.py to utils.py for more
  flexible access (solving import loop issue)
The learning rate finder implements a technique to easily estimate
a range of learning rates that should perform well given the
current model setup. When the model architecture or other
hyperparameters are changed, the learning rate finder can be run in
order to find a new suitable LR range.

The learning rate finder starts training the model at a very low LR,
increasing it every batch. The batch loss is plotted vs training
steps and a figure is created from which a suitable LR range can be
determined.

This technique was first introduced by Leslie Smith in
https://arxiv.org/abs/1506.01186.
@jpata
Copy link
Owner

jpata commented Jun 29, 2021

I think the new pipeline and the reorganization are great!
Perhaps there is an opportunity to reduce code duplication further in the functions in pipeline (e.g. dataset, loss & model setup)?
Seems like tqdm should be added to the github action dependency as well (some help in modernizing it would be welcome, too!)

@erwulff
Copy link
Collaborator Author

erwulff commented Jun 29, 2021

I think the new pipeline and the reorganization are great!
Perhaps there is an opportunity to reduce code duplication further in the functions in pipeline (e.g. dataset, loss & model setup)?
Seems like tqdm should be added to the github action dependency as well (some help in modernizing it would be welcome, too!)

Thanks!
Yes, there is definitely an opportunity for further reduction of code duplication. It is on my to-do list.

When running `python mlpf/pipeline.py evaluate -t <train_dir>` without
specifying which weights to use explicitly the pipeline will load the
weights with the smallest loss in <train_dir>/weights/ that it can find.
This can be useful when many large checkpoint files take up too much storage space.
The default parameters for expdecay added to the config files
in this commit are the same as those used on the
jpata/particleflow master branch at the time of writing.
@erwulff erwulff marked this pull request as ready for review July 8, 2021 10:35
erwulff and others added 2 commits July 9, 2021 11:33
Also:
- Add missing parameters to config files.
- Move make_weights_function to utils.py
Copy link
Owner

@jpata jpata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a small comment inline.

I think we could go ahead with this, and later follow up with a PR that completely moves all functionality to the new pipeline (I didn't try the new one yet, just made sure the old pipeline works as before).

Thanks a lot for the effort!

mlpf/tfmodel/model_setup.py Outdated Show resolved Hide resolved
@jpata jpata merged commit 1e4c581 into jpata:master Jul 23, 2021
jpata added a commit that referenced this pull request Sep 15, 2023
OneCycle LR, LR finder, custom Tensorboard, etc.

Former-commit-id: 1e4c581
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants