Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leaking when using large numpy array in Dataset #1761

Closed
mpaepper opened this issue May 8, 2020 · 8 comments
Closed

Memory leaking when using large numpy array in Dataset #1761

mpaepper opened this issue May 8, 2020 · 8 comments
Labels
bug Something isn't working help wanted Open to be worked on

Comments

@mpaepper
Copy link

mpaepper commented May 8, 2020

🐛 Bug

Thank you for the great library! When migrating a larger project, I am running into memory issues, though, so maybe someone can help me out.

So, I have a pretty complicated DataSet which loads plenty of data and buffers the data into the CPU RAM as a numpy array.

I train using ddp and with num_workers = 6 in the dataloader. The training crashes my machine, because of CPU memory overflow. It works with num_workers = 0, but the higher the num_workers, the higher the memory consumption.

I figured out that this is much worse when using a large numpy array in the Dataset rather than a PyTorch tensor.
Unfortunately, I need numpy arrays, so I am asking you if there is anything I can do?

To Reproduce

I created a repository to reproduce this. It allows you to train a model on toy data using either a PyTorch tensor or a numpy array in the Dataset.

When running it with the PyTorch tensor the same amount of data uses 5GB of RAM while with Numpy it uses more than 30GB of RAM.
The higher the number of num_workers, the higher the RAM usage - it seems to leak when using numpy?

  1. Clone https://github.com/mpaepper/reproduce_pytorch_lightning_memory_issues
  2. Try the PyTorch tensor with: python minimal.py --num_workers 10
  3. Try the numpy array with: python minimal.py --numpy --num_workers 10
  4. Compare the huge difference in memory consumption

Code sample

https://github.com/mpaepper/reproduce_pytorch_lightning_memory_issues

Expected behavior

I would expect that numpy and PyTorch tensors should behave in the same way when using num_workers > 0, i.e. memory consumption is similar.

Environment

  • CUDA:
    - GPU:
    - GeForce RTX 2080 Ti
    - GeForce RTX 2080 Ti
    - GeForce RTX 2080 Ti
    - GeForce RTX 2080 Ti
    - available: True
    - version: 10.1
  • Packages:
    - numpy: 1.16.4
    - pyTorch_debug: False
    - pyTorch_version: 1.4.0
    - pytorch-lightning: 0.7.5
    - tensorboard: 1.14.0
    - tqdm: 4.46.0
  • System:
    - OS: Linux
    - architecture:
    - 64bit
    -
    - processor: x86_64
    - python: 3.7.3
    - version: Support for multiple val_dataloaders #97-Ubuntu SMP Wed Apr 1 03:25:46 UTC 2020
@mpaepper mpaepper added bug Something isn't working help wanted Open to be worked on labels May 8, 2020
@github-actions
Copy link
Contributor

github-actions bot commented May 8, 2020

Hi! thanks for your contribution!, great first issue!

@bjmnbraun
Copy link

I had a similar issue and if I recall defining environmental variable COLAB_GPU forces pytorch lightning to use fork, which might prevent this Nx memory blowup.

https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/trainer.py#L779

@mpaepper
Copy link
Author

mpaepper commented May 8, 2020

Thank you for the answer, but it seems that that option only works for TPU training?
I am training on GPUs.

I tried it out anyways, but it didn't improve my situation. Any other pointers / ideas?

@mpaepper
Copy link
Author

mpaepper commented May 11, 2020

I tried to manually rewrite the PyTorch Lightning code to use fork instead of spawn, but then the error "Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method" comes up:

Process Process-1:
Traceback (most recent call last):
  File "/home/xxx/anaconda3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/xxx/anaconda3/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/xxx/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_data_parallel.py", line 345, in ddp_train
    torch.cuda.set_device(self.root_gpu)
  File "/home/xxx/anaconda3/lib/python3.7/site-packages/torch/cuda/__init__.py", line 292, in set_device
    torch._C._cuda_setDevice(device)
  File "/home/xxx/anaconda3/lib/python3.7/site-packages/torch/cuda/__init__.py", line 195, in _lazy_init
    "Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

@Borda
Copy link
Member

Borda commented May 11, 2020

mind be similar to #1769

@mpaepper
Copy link
Author

mpaepper commented May 19, 2020

So for others running into this:

As a workaround, during __init__ I move everything from numpy to PyTorch tensors, so they are stored in RAM -> then the shared memory works. When I use them, I transform them back from PyTorch to numpy (.detach().numpy()).
However, it might fail when you have large amounts of memories stored here, because the file limits of your operating system don't allow you to have enough files open.

Check out ulimit -n (was 1024 for me).

Setting it to a higher limit with ulimit -n 9999 then fixed the error and training works.

However, it still seems too slow. It's only half as fast as it was using Torchbearer before.

The more num_workers I use in the Dataloader, the slower the start of an epoch similar as described here in this issue: #1884

@williamFalcon
Copy link
Contributor

@mpaepper check again?
This should be fixed on master now

@mpaepper
Copy link
Author

mpaepper commented Jun 3, 2020

Yes, thank you. It's resolved with the recent master additions 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

No branches or pull requests

5 participants