Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RAdam and LookAhead optimizers #506

Merged
merged 13 commits into from
Sep 18, 2019
Merged

Add RAdam and LookAhead optimizers #506

merged 13 commits into from
Sep 18, 2019

Conversation

CyberZHG
Copy link
Contributor

@CyberZHG CyberZHG commented Sep 13, 2019

Related to #422:

  • The name of the optimizer is RectifiedAdam in case of conflicting with other abbreviations. (The name appears in the first line of the second page in the paper)
  • Add Lookahead optimizer wrapper.
  • Unit tests for RAdam are compared with the results of the official implementation.
  • Unit tests for Ranger (RAdam + Lookahead) are compared with the results of the implementation from the proposer.

@Squadrick
Copy link
Member

Squadrick commented Sep 13, 2019

I think a more useful use of LookAhead would be to make a generic wrapper that can take a tf.keras.optimizers.Optimizer and apply lookahead. (Similar to tfa.optimizers.MovingAverage).

So the final API for using Ranger would look like this:

radam = tfa.optimizers.RectifiedAdam(lr=1e-3)
ranger = tfa.optimizers.LookAhead(radam, step=4, ratio=0.5)

This will allow for other optimizers to be used with LookAhead.

Ping @WindQAQ @facaiy for their thoughts on the matter.

@CyberZHG
Copy link
Contributor Author

I've added a standalone Lookahead wrapper.

@Squadrick Squadrick changed the title Add RAdam optimizer Add RAdam and LookAhead optimizers Sep 15, 2019
Copy link
Member

@Squadrick Squadrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CyberZHG Thank you so much for the contributions. Did a prelim review, and so far the code looks great. I've felt a few minor comments, I'll do a more thorough review later where I can verify the logic with the paper.

tensorflow_addons/optimizers/BUILD Outdated Show resolved Hide resolved
tensorflow_addons/optimizers/BUILD Show resolved Hide resolved
tensorflow_addons/optimizers/README.md Outdated Show resolved Hide resolved
tensorflow_addons/optimizers/README.md Outdated Show resolved Hide resolved
tensorflow_addons/optimizers/__init__.py Outdated Show resolved Hide resolved
tensorflow_addons/optimizers/lookahead.py Outdated Show resolved Hide resolved
tensorflow_addons/optimizers/lookahead.py Outdated Show resolved Hide resolved
tensorflow_addons/optimizers/lookahead.py Outdated Show resolved Hide resolved
tensorflow_addons/optimizers/lookahead.py Outdated Show resolved Hide resolved
tensorflow_addons/optimizers/rectified_adam.py Outdated Show resolved Hide resolved
@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@CyberZHG
Copy link
Contributor Author

It seems that the mod operation is not working properly on GPU. I've changed the implementation to use floordiv in Lookahead. I don't know if there will be any numerical stability issue with the modification.

Copy link
Member

@WindQAQ WindQAQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. Thanks for the contribution 😃

tensorflow_addons/optimizers/lookahead.py Outdated Show resolved Hide resolved
@seanpmorgan
Copy link
Member

It seems that the mod operation is not working properly on GPU. I've changed the implementation to use floordiv in Lookahead. I don't know if there will be any numerical stability issue with the modification.

I'm not sure why TF2 isn't using a soft device placement for this. There is no GPU implementation for the overloaded __mod__:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/cwise_op_gpu_mod.cu.cc#L22

But I would have assumed it'd fall back to CPU with dynamic kernels implemented. cc @karmel @martinwicke

seanpmorgan
seanpmorgan previously approved these changes Sep 17, 2019
Copy link
Member

@seanpmorgan seanpmorgan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.. thank you very much for this great contribution. Also thank you to everyone for the reviews and suggestions.

If there are no other objections I'd vote we merge and can possibly handle the __mod__ issue if some information becomes available.

Copy link
Member

@Squadrick Squadrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small nit.

tensorflow_addons/optimizers/lookahead.py Outdated Show resolved Hide resolved
slow_step_size = self._get_hyper('slow_step_size', var_dtype)
step_back = slow_var + slow_step_size * (var - slow_var)
sync_cond = tf.equal(local_step % sync_period, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have assumed it'd fall back to CPU with dynamic kernels implemented.

I believe so. Have you ever tried tf.math.floormod?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried this one and it's not working.

Copy link
Contributor Author

@CyberZHG CyberZHG Sep 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you use tf.device to force all ops place on GPU? Can you give a minimal reproducing example? Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The outputs are the same as the results of this CI. It fails the fit_simple_linear_model test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the information, really useful! Could you file an issue for it and we might resolve it later?

@Squadrick
Copy link
Member

LGTM. @CyberZHG Thank you so much for this contribution, I'm sure the community will find the optimizers very useful.

@Squadrick Squadrick merged commit 220fad1 into tensorflow:master Sep 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants