empty_cache calls in training occupy memory on gpu #0 #614

antvconst · 2019-12-09T11:42:51Z

Training on GPU other than gpu #0 allocates a ~500Mb chunk of memory on gpu #0, the memory is totally unused and should not be allocated at all. Debugging shows that initial allocation happens at this line: https://github.com/williamFalcon/pytorch-lightning/blob/2f01c03b38fc16618aa9839d39e0ae5a142c0559/pytorch_lightning/trainer/trainer.py#L517

A bit of research led me to this issue in PyTorch repo: pytorch/pytorch#25752. This is not the only place where empty_cache is called, by the way. Did not check, but other calls probably work the same way.

For now I duct-tape-fixed it for myself by running my script with CUDA_VISIBLE_DEVICES=2 and setting gpus=[0]. Not sure how to fix it properly, though. Would be glad if someone would take a look.

The text was updated successfully, but these errors were encountered:

jeffling · 2019-12-09T22:40:17Z

This is primarily a pytorch problem, but a fix for us could be to always use torch.cuda.set_device somewhere before our first empty_cache call. I'm not sure what other side effects that might have. Feel free to submit a PR if you'd like

annukkaa · 2020-03-05T13:29:14Z

CUDA_VISIBLE_DEVICES=2 did not solve it for me, but adding torch.cuda.set_device(2) did

Borda · 2020-03-26T15:40:13Z

mind check #1094? @jeffling @annukkaa @falceeffect

Borda · 2020-04-09T11:53:27Z

feel free to re-open if needed 🤖

antvconst added the bug Something isn't working label Dec 9, 2019

Borda closed this as completed Apr 9, 2020

feribg mentioned this issue Jun 7, 2020

LR finder broken #2101

Closed

ssakhavi mentioned this issue Jun 17, 2020

Checking if the parameters are a DictConfig Object #2216

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

empty_cache calls in training occupy memory on gpu #0 #614

empty_cache calls in training occupy memory on gpu #0 #614

antvconst commented Dec 9, 2019

jeffling commented Dec 9, 2019 •

edited

Loading

annukkaa commented Mar 5, 2020

Borda commented Mar 26, 2020

Borda commented Apr 9, 2020

empty_cache calls in training occupy memory on gpu #0 #614

empty_cache calls in training occupy memory on gpu #0 #614

Comments

antvconst commented Dec 9, 2019

jeffling commented Dec 9, 2019 • edited Loading

annukkaa commented Mar 5, 2020

Borda commented Mar 26, 2020

Borda commented Apr 9, 2020

jeffling commented Dec 9, 2019 •

edited

Loading