You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
0%| | 0/14 00:00
Traceback (most recent call last):
File "train.py", line 634, in
main(opt)
File "train.py", line 528, in main
train(opt.hyp, opt, device, callbacks)
File "train.py", line 277, in train
for i, (imgs, targets, paths, _) in pbar: # batch -------------------------------------------------------------
File "/opt/conda/lib/python3.8/site-packages/tqdm/std.py", line 1180, in iter
for obj in iterable:
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/opt/conda/lib/python3.8/site-packages/torch/_utils.py", line 438, in reraise
raise exception
RuntimeError: Caught RuntimeError in pin memory thread for device 0.
Original Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 34, in _pin_memory_loop
data = pin_memory(data)
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 58, in pin_memory
return [pin_memory(sample) for sample in data]
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 58, in
return [pin_memory(sample) for sample in data]
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 50, in pin_memory
return data.pin_memory()
RuntimeError: CUDA error: out of memory
It is NOT working with a smaller batch size...
The text was updated successfully, but these errors were encountered:
the command to start the docker container: docker run --gpus=all --name yolov9 -it -v ./data/training/generated/generated_training_images_root_yoloV9/:/workspace/generated_training_images_root_yoloV9/ -v ./data/jupyter/yoloV9/:/workspace/ --shm-size=64g nvcr.io/nvidia/pytorch:21.11-py3
after that I did the recommended steps:
apt update
apt install -y zip htop screen libgl1-mesa-glx
pip install seaborn thop
cd /yolov9
So for me the standard installation is broken. I am using a RTX 3090 on Ubuntu 22.04 inside of WSL 2 on Windows 11 with the newest Nvidia drivers.
I get the following error after using the recommended docker image with the recommended installation steps from the readme:
python train.py --batch 16 --epochs 25 --img 640 --device 0 --min-items 0 --close-mosaic 15 --data ../generated_training_images_root_yoloV9/data.yaml --weights /workspace/weights/gelan-c.pt --cfg models/detect/gelan-c.yaml --hyp hyp.scratch-high.yaml
train: weights=/workspace/weights/gelan-c.pt, cfg=models/detect/gelan-c.yaml, data=../generated_training_images_root_yoloV9/data.yaml, hyp=hyp.scratch-high.yaml, epochs=25, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=None, image_weights=False, device=0, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, cos_lr=False, flat_cos_lr=False, fixed_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, min_items=0, close_mosaic=15, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
YOLOv5 🚀 1e33dbb Python-3.8.12 torch-1.11.0a0+b6df043 CUDA:0 (NVIDIA GeForce RTX 3090, 24575MiB)
hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, cls_pw=1.0, dfl=1.5, obj_pw=1.0, iou_t=0.2, anchor_t=5.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.9, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.15, copy_paste=0.3
ClearML: run 'pip install clearml' to automatically track, visualize and remotely train YOLO 🚀 in ClearML
Comet: run 'pip install comet_ml' to automatically track and visualize YOLO 🚀 runs in Comet
TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
Overriding model.yaml nc=80 with nc=5
0 -1 1 1856 models.common.Conv [3, 64, 3, 2]
1 -1 1 73984 models.common.Conv [64, 128, 3, 2]
2 -1 1 212864 models.common.RepNCSPELAN4 [128, 256, 128, 64, 1]
3 -1 1 164352 models.common.ADown [256, 256]
4 -1 1 847616 models.common.RepNCSPELAN4 [256, 512, 256, 128, 1]
5 -1 1 656384 models.common.ADown [512, 512]
6 -1 1 2857472 models.common.RepNCSPELAN4 [512, 512, 512, 256, 1]
7 -1 1 656384 models.common.ADown [512, 512]
8 -1 1 2857472 models.common.RepNCSPELAN4 [512, 512, 512, 256, 1]
9 -1 1 656896 models.common.SPPELAN [512, 512, 256]
10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
11 [-1, 6] 1 0 models.common.Concat [1]
12 -1 1 3119616 models.common.RepNCSPELAN4 [1024, 512, 512, 256, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 [-1, 4] 1 0 models.common.Concat [1]
15 -1 1 912640 models.common.RepNCSPELAN4 [1024, 256, 256, 128, 1]
16 -1 1 164352 models.common.ADown [256, 256]
17 [-1, 12] 1 0 models.common.Concat [1]
18 -1 1 2988544 models.common.RepNCSPELAN4 [768, 512, 512, 256, 1]
19 -1 1 656384 models.common.ADown [512, 512]
20 [-1, 9] 1 0 models.common.Concat [1]
21 -1 1 3119616 models.common.RepNCSPELAN4 [1024, 512, 512, 256, 1]
22 [15, 18, 21] 1 5494495 models.yolo.DDetect [5, [256, 512, 512]]
gelan-c summary: 621 layers, 25440927 parameters, 25440911 gradients, 103.2 GFLOPs
Transferred 931/937 items from /workspace/weights/gelan-c.pt
AMP: checks passed ✅
optimizer: SGD(lr=0.01) with parameter groups 154 weight(decay=0.0), 161 weight(decay=0.0005), 160 bias
train: Scanning /workspace/generated_training_images_root_yoloV9/train/labels.cache... 221 images, 0 backgrounds, 0 corrupt: 100%|██████████| 221/221 00:00
val: Scanning /workspace/generated_training_images_root_yoloV9/valid/labels.cache... 221 images, 0 backgrounds, 0 corrupt: 100%|██████████| 221/221 00:00
Plotting labels to runs/train/exp17/labels.jpg...
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/train/exp17
Starting training for 25 epochs...
0%| | 0/14 00:00
Traceback (most recent call last):
File "train.py", line 634, in
main(opt)
File "train.py", line 528, in main
train(opt.hyp, opt, device, callbacks)
File "train.py", line 277, in train
for i, (imgs, targets, paths, _) in pbar: # batch -------------------------------------------------------------
File "/opt/conda/lib/python3.8/site-packages/tqdm/std.py", line 1180, in iter
for obj in iterable:
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/opt/conda/lib/python3.8/site-packages/torch/_utils.py", line 438, in reraise
raise exception
RuntimeError: Caught RuntimeError in pin memory thread for device 0.
Original Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 34, in _pin_memory_loop
data = pin_memory(data)
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 58, in pin_memory
return [pin_memory(sample) for sample in data]
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 58, in
return [pin_memory(sample) for sample in data]
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 50, in pin_memory
return data.pin_memory()
RuntimeError: CUDA error: out of memory
It is NOT working with a smaller batch size...
The text was updated successfully, but these errors were encountered: