Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving memory snapshot #3315

Merged
merged 5 commits into from
May 28, 2024
Merged

Improving memory snapshot #3315

merged 5 commits into from
May 28, 2024

Conversation

cli99
Copy link
Contributor

@cli99 cli99 commented May 23, 2024

What does this PR do?

This PR uses _record_memory_history_impl instead of _record_memory_history_legacy (previous) to capture memory snapshot. See https://github.com/pytorch/pytorch/blob/main/torch/cuda/memory.py#L698-L738. With enabled =all, this captures all (c++ and python) alloc/free events and gives better memory timeline and stack trace information.

_record_memory_history_impl exists for pytorch 2.1, 2.2, 2.0, no version gating is needed.

What issue(s) does this change relate to?

image

@cli99 cli99 marked this pull request as ready for review May 23, 2024 00:45
Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check this function exists on torch 2.1/2.2/2.3? If it does't, we would need to gate on torch function.

@cli99
Copy link
Contributor Author

cli99 commented May 28, 2024

Can you check this function exists on torch 2.1/2.2/2.3? If it does't, we would need to gate on torch function.

Checked. The function exists on torch 2.1/2.2/2.3

@cli99 cli99 merged commit 1c1f36e into mosaicml:dev May 28, 2024
16 checks passed
@cli99 cli99 deleted the cli99/mem-snapshot branch May 28, 2024 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants