Linear warmup learning rate schedule #2086

srihari-humbarwadi · 2020-08-09T08:25:11Z

Describe the feature and the current behavior/state.
A learning rate schedule implementing linear warmup with step decay, currently unavailable in tensorflow

Relevant information

Are you willing to contribute it (yes/no): Yes
Are you willing to maintain it going forward? (yes/no): Yes
Is there a relevant academic paper? (if so, where): https://arxiv.org/abs/1706.02677
Is there already an implementation in another framework? (if so, where): Yes. tensorflow model garden, google auto ml
Was it part of tf.contrib? (if so, where):

Which API type would this fall under (layer, metric, optimizer, etc.)
Learning rate schedule

Who will benefit with this feature?

Considerable number of papers use warmup strategies, retinanet, efficientdet,
Users training on cloud tpus which need a high learning rate due to the large batch size, starting with a linear warmup is often helpful to achieve convergence sooner

Any other info.
Here is a prelimary code implementation, please let me know if this is something that can find a place in tf-addons.

class PiecewiseConstantDecayWithLinearWarmup(
        tf.keras.optimizers.schedules.PiecewiseConstantDecay):

    def __init__(self, warmup_learning_rate, warmup_steps, boundaries, values,
                 **kwargs):
        super(PiecewiseConstantDecayWithLinearWarmup,
              self).__init__(boundaries=boundaries, values=values, **kwargs)

        self.warmup_learning_rate = warmup_learning_rate
        self.warmup_steps = warmup_steps
        self._step_size = self.values[0] - self.warmup_learning_rate

    def __call__(self, step):
        with tf.name_scope(self.name or
                           'PiecewiseConstantDecayWithLinearWarmup'):
            learning_rate = tf.cond(
                pred=tf.less(step, self.warmup_steps),
                true_fn=lambda:
                (self.warmup_learning_rate + tf.cast(step, dtype=tf.float32) /
                 self.warmup_steps * self._step_size),
                false_fn=lambda: (super(PiecewiseConstantDecayWithLinearWarmup,
                                        self).__call__(step)))
        return learning_rate

    def get_config(self):
        config = {
            "warmup_learning_rate": self.warmup_learning_rate,
            "warmup_steps": self.warmup_steps,
        }
        base_config = super(PiecewiseConstantDecayWithLinearWarmup,
                            self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

Possible extension
We could convert this into a generalized wrapper for all the existing lr schedules, hence adding the warmup functionality to them.

AakashKumarNain · 2020-08-09T16:33:07Z

Hi @srihari-humbarwadi thank for bringing this up. Yes, warmup schedules are pretty common nowadays. Please feel free to open a PR.

srihari-humbarwadi · 2020-08-09T19:04:16Z

Should i just do it for PiecewiseConstantDecay or implement a generic wrapper?

AakashKumarNain · 2020-08-10T16:21:04Z

A generic wrapper makes more sense but if you want to keep the first PR simple, you can just implement PiecewiseConstantDecay first.

seanpmorgan · 2023-03-01T03:40:29Z

TensorFlow Addons is transitioning to a minimal maintenance and release mode. New features will not be added to this repository. For more information, please see our public messaging on this decision:
TensorFlow Addons Wind Down

Please consider sending feature requests / contributions to other repositories in the TF community with a similar charters to TFA:
Keras
Keras-CV
Keras-NLP

WindQAQ added Feature Request optimizers labels Aug 9, 2020

srihari-humbarwadi mentioned this issue Aug 18, 2020

Linear warmup learning rate schedule #2100

Closed

5 tasks

seanpmorgan closed this as completed Mar 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linear warmup learning rate schedule #2086

Linear warmup learning rate schedule #2086

srihari-humbarwadi commented Aug 9, 2020 •

edited

Loading

AakashKumarNain commented Aug 9, 2020

srihari-humbarwadi commented Aug 9, 2020

AakashKumarNain commented Aug 10, 2020

seanpmorgan commented Mar 1, 2023

Linear warmup learning rate schedule #2086

Linear warmup learning rate schedule #2086

Comments

srihari-humbarwadi commented Aug 9, 2020 • edited Loading

AakashKumarNain commented Aug 9, 2020

srihari-humbarwadi commented Aug 9, 2020

AakashKumarNain commented Aug 10, 2020

seanpmorgan commented Mar 1, 2023

srihari-humbarwadi commented Aug 9, 2020 •

edited

Loading