Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutli GPU Training #7608

Closed
1 task done
Venky0892 opened this issue Apr 27, 2022 · 4 comments
Closed
1 task done

Mutli GPU Training #7608

Venky0892 opened this issue Apr 27, 2022 · 4 comments
Labels
question Further information is requested Stale

Comments

@Venky0892
Copy link

Search before asking

Question

Hi, I trying to train around 200K images via Tesla V100, I have 4 of these in my compute instance. I'm training with this command "python -m torch.distributed.launch --nproc_per_node 4" as mentioned in the documentation. But I could only able see just one GPU is active, I'm not sure why? I checked the GPU usage metric in Wandb tool. Can someone help me with this? If I run normal training without distributed launch I could see all the GPU are being used.

Additional

No response

@Venky0892 Venky0892 added the question Further information is requested label Apr 27, 2022
@glenn-jocher
Copy link
Member

glenn-jocher commented Apr 27, 2022

@Venky0892 see Multi-GPU Training tutorial for correct commands:

YOLOv5 Tutorials

Good luck 🍀 and let us know if you have any other questions!

@Venky0892
Copy link
Author

Venky0892 commented Apr 28, 2022

Hi @glenn-jocher
I referred to the documentation for Multi GPU Training:
Here the Command : "$ python -m torch.distributed.launch --nproc_per_node 4 train.py --batch 128 --data furniture.yaml --weights yolov5s.pt --device 0,1,2,3

I want to use four GPU's so gave --nproc_ per_node = 4

Screenshot 2022-04-28 at 6 09 25 PM

I could only see GPU 0 is active and running. I'm not sure it could be great if you can assits me in this

@github-actions
Copy link
Contributor

github-actions bot commented May 29, 2022

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

@glenn-jocher
Copy link
Member

@Venky0892 The command you provided looks correct, so it's strange that only GPU 0 is active. Would you mind checking if the nvidia-smi command correctly shows all 4 GPUs under normal circumstances? Additionally, could you please verify that your system meets the requirements outlined in the Multi-GPU Training tutorial in our documentation? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale
Projects
None yet
Development

No branches or pull requests

2 participants