Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OutOfMemoryError #355

Open
nwoyecid opened this issue Jul 6, 2023 · 3 comments
Open

OutOfMemoryError #355

nwoyecid opened this issue Jul 6, 2023 · 3 comments

Comments

@nwoyecid
Copy link

nwoyecid commented Jul 6, 2023

I have OOM error during inference but not during training.

This happens even with batch size of 1 and even with increasing the GPU memory.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 11.25 GiB (GPU 0; 44.42 GiB total capacity; 36.96 GiB already allocated; 3.95 GiB free; 38.83 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I think its not OOM issue but something wrong with the Trainer increasing reserving memory.

@nwoyecid
Copy link
Author

This error occurs only during inference/evaluation and not during training.

@AhmedGamal411
Copy link

I can infer fine, can you share the code ?

@Siddharth-Latthe-07
Copy link

I guess there could be many factors for occurrence of the OOM error:-
Like if you are using techniques like beam search in NLP models or large batch sizes for evaluation.
Secondly, Memory fragmentation can lead to inefficient use of GPU memory.
Try using this:-

import torch
torch.cuda.set_per_process_memory_fraction(0.9)  # Adjust the fraction as needed
torch.backends.cuda.matmul.allow_tf32 = True  # Enabling TF32 precision to save memory

or you may even try to clean up the cache before starting inference by this:-

import torch
torch.cuda.empty_cache()

provide the code for more detailed help
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants