Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix LazyAdam resource variable ops performance issue #2274

Merged

Conversation

edend10
Copy link
Contributor

@edend10 edend10 commented Dec 7, 2020

Description

Brief Description of the PR:
Swap calls to OptimizerV2._resource_scatter_update and OptimizerV2._resource_scatter_add with direct calls to resource_variable_ops.resource_scatter_update and resource_variable_ops.resource_scatter_sub.

The OptimizerV2 methods are wrappers around the resource_variable_ops, except they extract and return the underlying Tensor from the op instead of returning the operation. This causes a big performance hit as outlined in issue #2273.

The returned tensor is not necessary for this "lazy" version of Adam sparse updates, since the op isn't used in subsequent computation. Thus we can avoid the extra .value() call in OptimizerV2._resource_scatter_update/sub/add

Fixes #2273

Type of change

Checklist:

  • [ X] I've properly formatted my code according to the guidelines
    • By running Black + Flake8
    • By running pre-commit hooks
  • This PR addresses an already submitted issue for TensorFlow Addons
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • This PR contains modifications to C++ custom-ops

How Has This Been Tested?

If you're adding a bugfix or new feature please describe the tests that you ran to verify your changes:

With both the Keras and Estimator APIs. Validated model equality while observing major performance boost.

@google-cla
Copy link

google-cla bot commented Dec 7, 2020

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@google-cla google-cla bot added the cla: no label Dec 7, 2020
@bot-of-gabrieldemarmiesse

@SSaishruthi

You are owner of some files modified in this pull request.
Would you kindly review the changes whenever you have the time to?
Thank you very much.

@edend10
Copy link
Contributor Author

edend10 commented Dec 7, 2020

@googlebot I signed it!

@google-cla google-cla bot added cla: yes and removed cla: no labels Dec 7, 2020
@edend10 edend10 force-pushed the fix_lazyadam_performance_bug_2273 branch from 0ab4d4c to 389e955 Compare December 8, 2020 00:03
@edend10 edend10 closed this Dec 8, 2020
@edend10 edend10 force-pushed the fix_lazyadam_performance_bug_2273 branch from 389e955 to e18fcfc Compare December 8, 2020 02:04
Copy link
Member

@WindQAQ WindQAQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the fix and sorry that I didn't notice the performance regression previously.

@WindQAQ WindQAQ merged commit 664762a into tensorflow:master Dec 8, 2020
@edend10
Copy link
Contributor Author

edend10 commented Dec 8, 2020

@WindQAQ no worries! Thanks for merging.

I see some GPU build check failed after the merge with a connection error. Is that related/normal?

@WindQAQ
Copy link
Member

WindQAQ commented Dec 8, 2020

It's unrelated to this PR I believe 😄 Will let you know if there is anything wrong!

jrruijli pushed a commit to jrruijli/addons that referenced this pull request Dec 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Significant LazyAdam optimizer performance degradation since PR#1988
4 participants