-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RAdam and LookAhead optimizers #506
Conversation
I think a more useful use of So the final API for using radam = tfa.optimizers.RectifiedAdam(lr=1e-3)
ranger = tfa.optimizers.LookAhead(radam, step=4, ratio=0.5) This will allow for other optimizers to be used with |
I've added a standalone Lookahead wrapper. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CyberZHG Thank you so much for the contributions. Did a prelim review, and so far the code looks great. I've felt a few minor comments, I'll do a more thorough review later where I can verify the logic with the paper.
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
It seems that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM. Thanks for the contribution 😃
I'm not sure why TF2 isn't using a soft device placement for this. There is no GPU implementation for the overloaded But I would have assumed it'd fall back to CPU with dynamic kernels implemented. cc @karmel @martinwicke |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.. thank you very much for this great contribution. Also thank you to everyone for the reviews and suggestions.
If there are no other objections I'd vote we merge and can possibly handle the __mod__
issue if some information becomes available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small nit.
slow_step_size = self._get_hyper('slow_step_size', var_dtype) | ||
step_back = slow_var + slow_step_size * (var - slow_var) | ||
sync_cond = tf.equal(local_step % sync_period, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have assumed it'd fall back to CPU with dynamic kernels implemented.
I believe so. Have you ever tried tf.math.floormod
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried this one and it's not working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you use tf.device to force all ops place on GPU? Can you give a minimal reproducing example? Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The outputs are the same as the results of this CI. It fails the fit_simple_linear_model test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the information, really useful! Could you file an issue for it and we might resolve it later?
LGTM. @CyberZHG Thank you so much for this contribution, I'm sure the community will find the optimizers very useful. |
Related to #422:
RectifiedAdam
in case of conflicting with other abbreviations. (The name appears in the first line of the second page in the paper)Lookahead
optimizer wrapper.