Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear warmup learning rate schedule #2086

Closed
srihari-humbarwadi opened this issue Aug 9, 2020 · 4 comments
Closed

Linear warmup learning rate schedule #2086

srihari-humbarwadi opened this issue Aug 9, 2020 · 4 comments

Comments

@srihari-humbarwadi
Copy link

srihari-humbarwadi commented Aug 9, 2020

Describe the feature and the current behavior/state.
A learning rate schedule implementing linear warmup with step decay, currently unavailable in tensorflow

Relevant information

  • Are you willing to contribute it (yes/no): Yes
  • Are you willing to maintain it going forward? (yes/no): Yes
  • Is there a relevant academic paper? (if so, where): https://arxiv.org/abs/1706.02677
  • Is there already an implementation in another framework? (if so, where): Yes. tensorflow model garden, google auto ml
  • Was it part of tf.contrib? (if so, where):

Which API type would this fall under (layer, metric, optimizer, etc.)
Learning rate schedule

Who will benefit with this feature?

  • Considerable number of papers use warmup strategies, retinanet, efficientdet,
  • Users training on cloud tpus which need a high learning rate due to the large batch size, starting with a linear warmup is often helpful to achieve convergence sooner

Any other info.
Here is a prelimary code implementation, please let me know if this is something that can find a place in tf-addons.

class PiecewiseConstantDecayWithLinearWarmup(
        tf.keras.optimizers.schedules.PiecewiseConstantDecay):

    def __init__(self, warmup_learning_rate, warmup_steps, boundaries, values,
                 **kwargs):
        super(PiecewiseConstantDecayWithLinearWarmup,
              self).__init__(boundaries=boundaries, values=values, **kwargs)

        self.warmup_learning_rate = warmup_learning_rate
        self.warmup_steps = warmup_steps
        self._step_size = self.values[0] - self.warmup_learning_rate

    def __call__(self, step):
        with tf.name_scope(self.name or
                           'PiecewiseConstantDecayWithLinearWarmup'):
            learning_rate = tf.cond(
                pred=tf.less(step, self.warmup_steps),
                true_fn=lambda:
                (self.warmup_learning_rate + tf.cast(step, dtype=tf.float32) /
                 self.warmup_steps * self._step_size),
                false_fn=lambda: (super(PiecewiseConstantDecayWithLinearWarmup,
                                        self).__call__(step)))
        return learning_rate

    def get_config(self):
        config = {
            "warmup_learning_rate": self.warmup_learning_rate,
            "warmup_steps": self.warmup_steps,
        }
        base_config = super(PiecewiseConstantDecayWithLinearWarmup,
                            self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

Possible extension
We could convert this into a generalized wrapper for all the existing lr schedules, hence adding the warmup functionality to them.

@AakashKumarNain
Copy link
Member

Hi @srihari-humbarwadi thank for bringing this up. Yes, warmup schedules are pretty common nowadays. Please feel free to open a PR.

@srihari-humbarwadi
Copy link
Author

Should i just do it for PiecewiseConstantDecay or implement a generic wrapper?

@AakashKumarNain
Copy link
Member

A generic wrapper makes more sense but if you want to keep the first PR simple, you can just implement PiecewiseConstantDecay first.

@seanpmorgan
Copy link
Member

TensorFlow Addons is transitioning to a minimal maintenance and release mode. New features will not be added to this repository. For more information, please see our public messaging on this decision:
TensorFlow Addons Wind Down

Please consider sending feature requests / contributions to other repositories in the TF community with a similar charters to TFA:
Keras
Keras-CV
Keras-NLP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants