fix LazyAdam resource variable ops performance issue #2274

edend10 · 2020-12-07T23:45:02Z

Description

Brief Description of the PR:
Swap calls to OptimizerV2._resource_scatter_update and OptimizerV2._resource_scatter_add with direct calls to resource_variable_ops.resource_scatter_update and resource_variable_ops.resource_scatter_sub.

The OptimizerV2 methods are wrappers around the resource_variable_ops, except they extract and return the underlying Tensor from the op instead of returning the operation. This causes a big performance hit as outlined in issue #2273.

The returned tensor is not necessary for this "lazy" version of Adam sparse updates, since the op isn't used in subsequent computation. Thus we can avoid the extra .value() call in OptimizerV2._resource_scatter_update/sub/add

Fixes #2273

Type of change

Checklist:

[ X] I've properly formatted my code according to the guidelines
- By running Black + Flake8
- By running pre-commit hooks
This PR addresses an already submitted issue for TensorFlow Addons
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
This PR contains modifications to C++ custom-ops

How Has This Been Tested?

If you're adding a bugfix or new feature please describe the tests that you ran to verify your changes:

Compared training with the fixed/unfixed LazyAdam optimizer in a Colab:
https://colab.research.google.com/drive/1T1X9log6pyDShHkKRxTPqkZwj0iZsowy?usp=sharing

With both the Keras and Estimator APIs. Validated model equality while observing major performance boost.

google-cla · 2020-12-07T23:45:07Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

ℹ️ Googlers: Go here for more info.

bot-of-gabrieldemarmiesse · 2020-12-07T23:46:04Z

@SSaishruthi

You are owner of some files modified in this pull request.
Would you kindly review the changes whenever you have the time to?
Thank you very much.

edend10 · 2020-12-07T23:48:19Z

@googlebot I signed it!

WindQAQ

LGTM! Thanks for the fix and sorry that I didn't notice the performance regression previously.

edend10 · 2020-12-08T10:19:46Z

@WindQAQ no worries! Thanks for merging.

I see some GPU build check failed after the merge with a connection error. Is that related/normal?

WindQAQ · 2020-12-08T10:23:04Z

It's unrelated to this PR I believe 😄 Will let you know if there is anything wrong!

boring-cyborg bot added the optimizers label Dec 7, 2020

google-cla bot added the cla: no label Dec 7, 2020

google-cla bot added cla: yes and removed cla: no labels Dec 7, 2020

edend10 force-pushed the fix_lazyadam_performance_bug_2273 branch from 0ab4d4c to 389e955 Compare December 8, 2020 00:03

edend10 closed this Dec 8, 2020

edend10 force-pushed the fix_lazyadam_performance_bug_2273 branch from 389e955 to e18fcfc Compare December 8, 2020 02:04

fix LazyAdam resource variable ops performance issue

9741cf1

edend10 reopened this Dec 8, 2020

WindQAQ self-requested a review December 8, 2020 04:05

WindQAQ added the kokoro:force-run label Dec 8, 2020

kokoro-team removed the kokoro:force-run label Dec 8, 2020

WindQAQ approved these changes Dec 8, 2020

View reviewed changes

WindQAQ merged commit 664762a into tensorflow:master Dec 8, 2020

jrruijli pushed a commit to jrruijli/addons that referenced this pull request Dec 23, 2020

fix LazyAdam resource variable ops performance issue (tensorflow#2274)

8d20f3c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix LazyAdam resource variable ops performance issue #2274

fix LazyAdam resource variable ops performance issue #2274

edend10 commented Dec 7, 2020 •

edited

Loading

google-cla bot commented Dec 7, 2020

bot-of-gabrieldemarmiesse commented Dec 7, 2020

edend10 commented Dec 7, 2020

WindQAQ left a comment

edend10 commented Dec 8, 2020

WindQAQ commented Dec 8, 2020

fix LazyAdam resource variable ops performance issue #2274

fix LazyAdam resource variable ops performance issue #2274

Conversation

edend10 commented Dec 7, 2020 • edited Loading

Description

Type of change

Checklist:

How Has This Been Tested?

google-cla bot commented Dec 7, 2020

What to do if you already signed the CLA

Individual signers

Corporate signers

bot-of-gabrieldemarmiesse commented Dec 7, 2020

edend10 commented Dec 7, 2020

WindQAQ left a comment

Choose a reason for hiding this comment

edend10 commented Dec 8, 2020

WindQAQ commented Dec 8, 2020

edend10 commented Dec 7, 2020 •

edited

Loading