Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(optim/meta): torch tensor memory not release due to gradient link #219

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ycsos
Copy link

@ycsos ycsos commented May 21, 2024

Description

when use torchopt.MetaAdam and step some times, the memory use in gpu are continuously increase. It should not be, will you excute next step, the tensor create in the former step is no need should be release. I find the reason: metaOptimizer not detach the gradient link in optimizer. and former tensor was not release by torch due to dependency.

you can run the test code, the first one memory increase by step increase. and second one (I change the code to detach the grad link) the memory is stable when step increase:

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.
You can use the syntax close #218 if this solves the issue #15213

  • I have raised an issue to propose this change (required)

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)
  • Example (update in the folder of example)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide. (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly. (required for a bug fix or a new feature)
  • I have updated the documentation accordingly.
  • I have reformatted the code using make format. (required)
  • I have checked the code using make lint. (required)
  • I have ensured make test pass. (required)

@Benjamin-eecs Benjamin-eecs changed the title fix torch tensor memory not release due to gradient link fix: torch tensor memory not release due to gradient link May 21, 2024
@Benjamin-eecs Benjamin-eecs changed the title fix: torch tensor memory not release due to gradient link fix(optim): torch tensor memory not release due to gradient link May 21, 2024
@Benjamin-eecs Benjamin-eecs changed the title fix(optim): torch tensor memory not release due to gradient link fix(optim.meta): torch tensor memory not release due to gradient link May 21, 2024
@Benjamin-eecs Benjamin-eecs changed the title fix(optim.meta): torch tensor memory not release due to gradient link fix(optim/meta): torch tensor memory not release due to gradient link May 21, 2024
Copy link

codecov bot commented May 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.69%. Comparing base (b3f570c) to head (d0132cb).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #219   +/-   ##
=======================================
  Coverage   93.69%   93.69%           
=======================================
  Files          83       83           
  Lines        2963     2964    +1     
=======================================
+ Hits         2776     2777    +1     
  Misses        187      187           
Flag Coverage Δ
unittests 93.69% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@XuehaiPan
Copy link
Member

Closing now. See my comment at #218 (comment).

@XuehaiPan XuehaiPan closed this May 21, 2024
@XuehaiPan XuehaiPan reopened this May 21, 2024
Comment on lines +84 to +90
updates, new_state = self.impl.update(
grads,
state,
params=flat_params,
inplace=False,
)
self.state_groups[i] = new_state
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updates can be detached from the graph while new_state should remain in the graph for explicit gradient computation. We need to add a new test for this. cc @JieRen98

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants