Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple GPU support #48

Closed
HaxThePlanet opened this issue Jun 13, 2020 · 10 comments
Closed

Multiple GPU support #48

HaxThePlanet opened this issue Jun 13, 2020 · 10 comments
Labels
enhancement New feature or request

Comments

@HaxThePlanet
Copy link

馃殌 Feature

Multiple GPU support

Motivation

Increased performance!

Pitch

I just bought a 3-way p100 box, come on please :)

Alternatives

Google Compute TPU support?

Additional context

@HaxThePlanet HaxThePlanet added the enhancement New feature or request label Jun 13, 2020
@github-actions
Copy link
Contributor

github-actions bot commented Jun 13, 2020

Hello @HaxThePlanet, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook Open In Colab, Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

  • Cloud-based AI systems operating on hundreds of HD video streams in realtime.
  • Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
  • Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

@glenn-jocher
Copy link
Member

@HaxThePlanet good news: yolov5 supports multi-gpu out of the box. Some examples:

python train.py  # will use ALL available cuda resources found on system
python train.py --device 0,1  # specify devices
python train.py --device 0  # specify 1 device 
python train.py --device cpu  # force cpu usage

test.py works exactly the same way. detect.py accepts a --device argument, but is limited to 1 gpu.

@HaxThePlanet
Copy link
Author

Excellent, thanks for the fast response and hard work. This thing is amazing!

@AIFAN-Lab
Copy link

when I type the command:
python train.py --data coco.yaml --cfg yolov5s.yaml --weights '' --batch-size 16
then, it will show below:
{'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.58, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.014, 'hsv_s': 0.68, 'hsv_v': 0.36, 'degrees': 0.0, 'translate': 0.0, 'scale': 0.5, 'shear': 0.0}
Namespace(adam=False, batch_size=16, bucket='', cache_images=False, cfg='./models/yolov5s.yaml', data='./data/coco.yaml', device='', epochs=300, evolve=False, img_size=[640, 640], multi_scale=False, name='', nosave=False, notest=False, rect=False, resume=False, single_cls=False, weights='')
Using CUDA Apex device0 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB)
device1 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB)
device2 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB)
device3 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB)
device4 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB)
device5 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB)
device6 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB)
device7 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB)
Optimizer groups: 54 .bias, 60 conv.weight, 51 other

bug report as below:
/share/home/xx/anaconda3/envs/pt1.5.0/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:303: UserWarning: Single-Process Multi-GPU is not the recommended mode for DDP. In this mode, each DDP instance operates on multiple devices and creates multiple module replicas within one process. The overhead of scatter/gather and GIL contention in every forward pass can slow down training. Please consider using one DDP instance per device or per module replica by explicitly setting device_ids or CUDA_VISIBLE_DEVICES. NB: There is a known issue in nn.parallel.replicate that prevents a single DDP instance to operate on multiple model replicas.
"Single-Process Multi-GPU is not the recommended mode for "
Traceback (most recent call last):
File "train.py", line 400, in
train(hyp)
File "train.py", line 152, in train
model = torch.nn.parallel.DistributedDataParallel(model)
File "/share/home/xx/anaconda3/envs/pt1.5.0/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 287, in init
self._ddp_init_helper()
File "/share/home/xx/anaconda3/envs/pt1.5.0/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 380, in _ddp_init_helper
expect_sparse_gradient)
RuntimeError: Model replicas must have an equal number of parameters.

@glenn-jocher
Copy link
Member

glenn-jocher commented Jun 17, 2020

@AIFAN-Lab thanks for the bug report. I tested on two GPUs today and everything worked well. Can you try to reproduce this in our docker image to see if it's an environment issue?

@AIFAN-Lab
Copy link

Ok. I will test the Docker. And report later.

@HaxThePlanet
Copy link
Author

Is it still necessary to train the first 1000 or so iterations on a single GPU?

@glenn-jocher
Copy link
Member

@HaxThePlanet that's never been necessary.

@liangshi036
Copy link

@HaxThePlanet good news: yolov5 supports multi-gpu out of the box. Some examples:

python train.py  # will use ALL available cuda resources found on system
python train.py --device 0,1  # specify devices
python train.py --device 0  # specify 1 device 
python train.py --device cpu  # force cpu usage

test.py works exactly the same way. detect.py accepts a --device argument, but is limited to 1 gpu.

would you pls support multi-gpus while using detect.py ?

@glenn-jocher
Copy link
Member

@liangshi036 we don't have the resources to implement suggestions, but you can do this yourself and submit a PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants