Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

empty_cache calls in training occupy memory on gpu #0 #614

Closed
antvconst opened this issue Dec 9, 2019 · 4 comments
Closed

empty_cache calls in training occupy memory on gpu #0 #614

antvconst opened this issue Dec 9, 2019 · 4 comments
Labels
bug Something isn't working

Comments

@antvconst
Copy link
Contributor

Training on GPU other than gpu #0 allocates a ~500Mb chunk of memory on gpu #0, the memory is totally unused and should not be allocated at all. Debugging shows that initial allocation happens at this line: https://github.com/williamFalcon/pytorch-lightning/blob/2f01c03b38fc16618aa9839d39e0ae5a142c0559/pytorch_lightning/trainer/trainer.py#L517

A bit of research led me to this issue in PyTorch repo: pytorch/pytorch#25752. This is not the only place where empty_cache is called, by the way. Did not check, but other calls probably work the same way.

For now I duct-tape-fixed it for myself by running my script with CUDA_VISIBLE_DEVICES=2 and setting gpus=[0]. Not sure how to fix it properly, though. Would be glad if someone would take a look.

@antvconst antvconst added the bug Something isn't working label Dec 9, 2019
@jeffling
Copy link
Contributor

jeffling commented Dec 9, 2019

This is primarily a pytorch problem, but a fix for us could be to always use torch.cuda.set_device somewhere before our first empty_cache call. I'm not sure what other side effects that might have. Feel free to submit a PR if you'd like

@annukkaa
Copy link

annukkaa commented Mar 5, 2020

CUDA_VISIBLE_DEVICES=2 did not solve it for me, but adding torch.cuda.set_device(2) did

@Borda
Copy link
Member

Borda commented Mar 26, 2020

mind check #1094? @jeffling @annukkaa @falceeffect

@Borda
Copy link
Member

Borda commented Apr 9, 2020

feel free to re-open if needed 🤖

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants