[feature] Skip creating the CPU grad tensor when training #821

anj-s · 2021-10-21T12:21:43Z

What does this PR do?

This issue came up when identifying memory consumption for CPU offload. We create a CPU grad tensor at the beginning of the FW pass for every parameter even during eval. By skipping this, we save memory roughly equal to that of the parameters.

I don't think this change should have any negative effects given that we reset params at the beginning of FW but let me know if I haven't thought of a corner case. Thanks!

Before submitting

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

min-xu-ai

nice. maybe add a short comment in the code if not already covered by existing comment?

* skip creating cpu grads and pinning memory * added additional comment * pin docutils to fix circleCI

skip creating cpu grads and pinning memory

5d0aae9

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 21, 2021

anj-s requested a review from min-xu-ai October 21, 2021 12:21

min-xu-ai approved these changes Oct 21, 2021

View reviewed changes

anj-s added the FSDP + SSD offload label Oct 26, 2021

Anjali Sridhar added 3 commits October 26, 2021 15:32

Merge branch 'main' into skip-cpu-grad-pin

772fc7e

added additional comment

ad4fd34

pin docutils to fix circleCI

4bf7a42

anj-s merged commit 5f895f0 into main Oct 27, 2021

anj-s deleted the skip-cpu-grad-pin branch October 27, 2021 19:08

vtantia pushed a commit that referenced this pull request Oct 29, 2021

[feature] Skip creating the CPU grad tensor when training (#821)

2bd31a2

* skip creating cpu grads and pinning memory * added additional comment * pin docutils to fix circleCI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] Skip creating the CPU grad tensor when training #821

[feature] Skip creating the CPU grad tensor when training #821

anj-s commented Oct 21, 2021 •

edited

Loading

min-xu-ai left a comment

[feature] Skip creating the CPU grad tensor when training #821

[feature] Skip creating the CPU grad tensor when training #821

Conversation

anj-s commented Oct 21, 2021 • edited Loading

What does this PR do?

Before submitting

PR review

min-xu-ai left a comment

Choose a reason for hiding this comment

anj-s commented Oct 21, 2021 •

edited

Loading