Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA out of memory. #8

Open
sm3304love opened this issue Oct 5, 2021 · 4 comments
Open

RuntimeError: CUDA out of memory. #8

sm3304love opened this issue Oct 5, 2021 · 4 comments

Comments

@sm3304love
Copy link

sm3304love commented Oct 5, 2021

python rotate_train.py --weights rotate-yolov5s-ucas.pt --cfg rotate_yolov5s_ucas.yaml \
>      --data rotate_ucas.yaml --hyp hyp.ucas.yaml --img-size 1024 \
>      --epochs 3 --batch-size 12 --noautoanchor --rotate --cache
YOLOv5 🚀 v1.0-0-g298a36e torch 1.9.1+cu102 CUDA:0 (NVIDIA GeForce RTX 2060, 5934.5625MB)

Namespace(adam=False, artifact_alias='latest', batch_size=12, bbox_interval=-1, bucket='', cache_images=True, cfg='./models/rotate_yolov5s_ucas.yaml', data='./data/rotate_ucas.yaml', device='', entity=None, epochs=3, evolve=False, exist_ok=False, global_rank=-1, hyp='./data/hyp.ucas.yaml', image_weights=False, img_size=[1024, 1024], label_smoothing=0.0, linear_lr=False, local_rank=-1, multi_scale=False, name='exp', noautoanchor=True, nosave=False, notest=False, project='runs/train', quad=False, rect=False, resume=False, rotate=True, save_dir='runs/train/exp2', save_period=-1, single_cls=False, sync_bn=False, total_batch_size=12, upload_dataset=False, weights='rotate-yolov5s-ucas.pt', workers=8, world_size=1)
tensorboard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
hyperparameters: lr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.1, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=30.0, translate=0.1, scale=0.0, shear=5.0, perspective=0.0005, flipud=0.5, fliplr=0.5, mosaic=0.0, mixup=0.0
wandb: Install Weights & Biases for YOLOv5 logging with 'pip install wandb' (recommended)

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     18816  models.common.C3                        [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  1    156928  models.common.C3                        [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  1    625152  models.common.C3                        [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  9                -1  1   1182720  models.common.C3                        [512, 512, 1, False]          
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    361984  models.common.C3                        [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1182720  models.common.C3                        [512, 512, 1, False]          
 24      [17, 20, 23]  1     40455  models.yolo.Rotate_Detect               [2, [[27, 26, 20, 40, 44, 19, 34, 34, 25, 47], [55, 24, 44, 38, 31, 61, 50, 50, 63, 45], [65, 62, 88, 60, 84, 79, 113, 85, 148, 122]], [128, 256, 512]]
Model Summary: 283 layers, 7087815 parameters, 7087815 gradients, 16.5 GFLOPs

Transferred 360/362 items from rotate-yolov5s-ucas.pt
Scaled weight_decay = 0.00046875
Optimizer groups: 62 .bias, 62 conv.weight, 59 other
train: Scanning '../UCAS50/train.cache' images and labels... 38 found, 0 missing
train: Caching images (0.1GB): 100%|████████████| 38/38 [00:00<00:00, 81.82it/s]
val: Scanning '../UCAS50/val.cache' images and labels... 10 found, 0 missing, 0 
val: Caching images (0.0GB): 100%|██████████████| 10/10 [00:00<00:00, 19.17it/s]
Plotting labels... 
Image sizes 1024 train, 1024 test
Using 8 dataloader workers
Logging results to runs/train/exp2
Starting training for 3 epochs...

     Epoch   gpu_mem       box       obj       cls     total    labels  img_size
  0%|                                                     | 0/4 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "rotate_train.py", line 553, in <module>
    train(hyp, opt, device, tb_writer, rotate=opt.rotate)
  File "rotate_train.py", line 313, in train
    pred = model(imgs)  # forward
  File "/usr/local/lib/python3.6/dist-packages/torch-1.9.1-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/sm3304love/Desktop/RotateObjectDetection/rotate-yolov5/models/yolo.py", line 122, in forward
    return self.forward_once(x, profile)  # single-scale inference, train
  File "/home/sm3304love/Desktop/RotateObjectDetection/rotate-yolov5/models/yolo.py", line 153, in forward_once
    x = m(x)  # run
  File "/usr/local/lib/python3.6/dist-packages/torch-1.9.1-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/sm3304love/Desktop/RotateObjectDetection/rotate-yolov5/models/common.py", line 139, in forward
    return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
  File "/usr/local/lib/python3.6/dist-packages/torch-1.9.1-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch-1.9.1-py3.6-linux-x86_64.egg/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/usr/local/lib/python3.6/dist-packages/torch-1.9.1-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/sm3304love/Desktop/RotateObjectDetection/rotate-yolov5/models/common.py", line 105, in forward
    return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))
  File "/usr/local/lib/python3.6/dist-packages/torch-1.9.1-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/sm3304love/Desktop/RotateObjectDetection/rotate-yolov5/models/common.py", line 43, in forward
    return self.act(self.bn(self.conv(x)))
  File "/usr/local/lib/python3.6/dist-packages/torch-1.9.1-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch-1.9.1-py3.6-linux-x86_64.egg/torch/nn/modules/activation.py", line 395, in forward
    return F.silu(input, inplace=self.inplace)
  File "/usr/local/lib/python3.6/dist-packages/torch-1.9.1-py3.6-linux-x86_64.egg/torch/nn/functional.py", line 1898, in silu
    return torch._C._nn.silu(input)
RuntimeError: CUDA out of memory. Tried to allocate 24.00 MiB (GPU 0; 5.80 GiB total capacity; 2.56 GiB already allocated; 13.69 MiB free; 2.61 GiB reserved in total by PyTorch)

I followed the example while looking at the contents written on README.md, and there was an error in the last train part. How can I solve this problem?

@XinzeLee
Copy link
Owner

XinzeLee commented Oct 6, 2021

This is because your GPU runs out of memory. To overcome this, you have three options:

  1. Do not specify "--cache".
  2. Buy new RAM (more than 16 GB)
  3. Use/increase swap file size (below gives you 32GB of swap file)
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
free -h  # check memory

@sm3304love sm3304love reopened this Oct 7, 2021
@sm3304love
Copy link
Author

sm3304love commented Oct 7, 2021

Screenshot from 2021-10-07 20-34-10
Screenshot from 2021-10-07 20-38-18

As a result of checking, my computer's RAM is 16GB and the swapfile is 32GB. But still the same problem arises.

@XinzeLee
Copy link
Owner

XinzeLee commented Oct 7, 2021

Yes, but the program depends on GPU memory usage. And I think your machine has only 6GB memory for GPU. Maybe you can try: 1. reduce batch-size to 3; 2. reduce the image size.

@Zivid99
Copy link

Zivid99 commented Apr 15, 2022

hello, I get a problem when I install cuda extension in 'python setup.py install'. It seems that u run in cuda, and I wonder did you compile successfully?
here is my error :
`C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\include\cub\iterator../util_device.cuh(330): here
instantiation of "cub::PerDeviceAttributeCache::DevicePayload cub::PerDeviceAttributeCache::operator()(Invocable &&, int) [with Invocable=lambda [](int &)->cudaError_t]"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\include\cub\iterator../util_device.cuh(431): here

56 errors detected in the compilation of "inter_union_cuda.cu".
error: command 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\nvcc.exe' failed with exit code 1
`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants