Mutli GPU Training #7608

Venky0892 · 2022-04-27T13:16:12Z

Search before asking

I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

Hi, I trying to train around 200K images via Tesla V100, I have 4 of these in my compute instance. I'm training with this command "python -m torch.distributed.launch --nproc_per_node 4" as mentioned in the documentation. But I could only able see just one GPU is active, I'm not sure why? I checked the GPU usage metric in Wandb tool. Can someone help me with this? If I run normal training without distributed launch I could see all the GPU are being used.

Additional

No response

glenn-jocher · 2022-04-27T18:24:00Z

@Venky0892 see Multi-GPU Training tutorial for correct commands:

YOLOv5 Tutorials

Train Custom Data 🚀 RECOMMENDED
Tips for Best Training Results ☘️ RECOMMENDED
Weights & Biases Logging 🌟 NEW
Roboflow for Datasets, Labeling, and Active Learning 🌟 NEW
Multi-GPU Training
PyTorch Hub ⭐ NEW
TFLite, ONNX, CoreML, TensorRT Export 🚀
Test-Time Augmentation (TTA)
Model Ensembling
Model Pruning/Sparsity
Hyperparameter Evolution
Transfer Learning with Frozen Layers ⭐ NEW
Architecture Summary ⭐ NEW

Good luck 🍀 and let us know if you have any other questions!

Venky0892 · 2022-04-28T16:11:00Z

Hi @glenn-jocher
I referred to the documentation for Multi GPU Training:
Here the Command : "$ python -m torch.distributed.launch --nproc_per_node 4 train.py --batch 128 --data furniture.yaml --weights yolov5s.pt --device 0,1,2,3

I want to use four GPU's so gave --nproc_ per_node = 4

I could only see GPU 0 is active and running. I'm not sure it could be great if you can assits me in this

github-actions · 2022-05-29T00:21:29Z

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Wiki – https://github.com/ultralytics/yolov5/wiki
Tutorials – https://docs.ultralytics.com/yolov5
Docs – https://docs.ultralytics.com

Access additional Ultralytics ⚡ resources:

Ultralytics HUB – https://ultralytics.com/hub
Vision API – https://ultralytics.com/yolov5
About Us – https://ultralytics.com/about
Join Our Team – https://ultralytics.com/work
Contact Us – https://ultralytics.com/contact

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

glenn-jocher · 2023-11-15T18:14:15Z

@Venky0892 The command you provided looks correct, so it's strange that only GPU 0 is active. Would you mind checking if the nvidia-smi command correctly shows all 4 GPUs under normal circumstances? Additionally, could you please verify that your system meets the requirements outlined in the Multi-GPU Training tutorial in our documentation? Thank you!

Venky0892 added the question Further information is requested label Apr 27, 2022

github-actions bot added the Stale label May 29, 2022

github-actions bot closed this as completed Jun 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mutli GPU Training #7608

Mutli GPU Training #7608

Venky0892 commented Apr 27, 2022

glenn-jocher commented Apr 27, 2022 •

edited

Loading

Venky0892 commented Apr 28, 2022 •

edited

Loading

github-actions bot commented May 29, 2022 •

edited by glenn-jocher

Loading

glenn-jocher commented Nov 15, 2023

Mutli GPU Training #7608

Mutli GPU Training #7608

Comments

Venky0892 commented Apr 27, 2022

Search before asking

Question

Additional

glenn-jocher commented Apr 27, 2022 • edited Loading

YOLOv5 Tutorials

Venky0892 commented Apr 28, 2022 • edited Loading

github-actions bot commented May 29, 2022 • edited by glenn-jocher Loading

glenn-jocher commented Nov 15, 2023

glenn-jocher commented Apr 27, 2022 •

edited

Loading

Venky0892 commented Apr 28, 2022 •

edited

Loading

github-actions bot commented May 29, 2022 •

edited by glenn-jocher

Loading