Multiple GPU support #48

HaxThePlanet · 2020-06-13T02:51:20Z

🚀 Feature

Multiple GPU support

Motivation

Increased performance!

Pitch

I just bought a 3-way p100 box, come on please :)

Alternatives

Google Compute TPU support?

Additional context

github-actions · 2020-06-13T02:51:59Z

Hello @HaxThePlanet, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

Cloud-based AI systems operating on hundreds of HD video streams in realtime.
Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

glenn-jocher · 2020-06-13T03:46:48Z

@HaxThePlanet good news: yolov5 supports multi-gpu out of the box. Some examples:

python train.py  # will use ALL available cuda resources found on system
python train.py --device 0,1  # specify devices
python train.py --device 0  # specify 1 device 
python train.py --device cpu  # force cpu usage

test.py works exactly the same way. detect.py accepts a --device argument, but is limited to 1 gpu.

HaxThePlanet · 2020-06-13T04:27:09Z

Excellent, thanks for the fast response and hard work. This thing is amazing!

AIFAN-Lab · 2020-06-17T06:49:36Z

when I type the command:
python train.py --data coco.yaml --cfg yolov5s.yaml --weights '' --batch-size 16
then, it will show below:
{'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.58, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.014, 'hsv_s': 0.68, 'hsv_v': 0.36, 'degrees': 0.0, 'translate': 0.0, 'scale': 0.5, 'shear': 0.0}
Namespace(adam=False, batch_size=16, bucket='', cache_images=False, cfg='./models/yolov5s.yaml', data='./data/coco.yaml', device='', epochs=300, evolve=False, img_size=[640, 640], multi_scale=False, name='', nosave=False, notest=False, rect=False, resume=False, single_cls=False, weights='')
Using CUDA Apex device0 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB)
device1 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB)
device2 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB)
device3 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB)
device4 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB)
device5 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB)
device6 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB)
device7 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB)
Optimizer groups: 54 .bias, 60 conv.weight, 51 other

bug report as below:
/share/home/xx/anaconda3/envs/pt1.5.0/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:303: UserWarning: Single-Process Multi-GPU is not the recommended mode for DDP. In this mode, each DDP instance operates on multiple devices and creates multiple module replicas within one process. The overhead of scatter/gather and GIL contention in every forward pass can slow down training. Please consider using one DDP instance per device or per module replica by explicitly setting device_ids or CUDA_VISIBLE_DEVICES. NB: There is a known issue in nn.parallel.replicate that prevents a single DDP instance to operate on multiple model replicas.
"Single-Process Multi-GPU is not the recommended mode for "
Traceback (most recent call last):
File "train.py", line 400, in
train(hyp)
File "train.py", line 152, in train
model = torch.nn.parallel.DistributedDataParallel(model)
File "/share/home/xx/anaconda3/envs/pt1.5.0/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 287, in init
self._ddp_init_helper()
File "/share/home/xx/anaconda3/envs/pt1.5.0/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 380, in _ddp_init_helper
expect_sparse_gradient)
RuntimeError: Model replicas must have an equal number of parameters.

glenn-jocher · 2020-06-17T07:03:18Z

@AIFAN-Lab thanks for the bug report. I tested on two GPUs today and everything worked well. Can you try to reproduce this in our docker image to see if it's an environment issue?

Docker Image https://hub.docker.com/r/ultralytics/yolov5. See Docker Quickstart Guide

AIFAN-Lab · 2020-06-17T09:36:54Z

Ok. I will test the Docker. And report later.

HaxThePlanet · 2020-06-26T23:17:54Z

Is it still necessary to train the first 1000 or so iterations on a single GPU?

glenn-jocher · 2020-06-27T00:34:39Z

@HaxThePlanet that's never been necessary.

liangshi036 · 2020-11-26T06:09:29Z

@HaxThePlanet good news: yolov5 supports multi-gpu out of the box. Some examples:
python train.py  # will use ALL available cuda resources found on system
python train.py --device 0,1  # specify devices
python train.py --device 0  # specify 1 device 
python train.py --device cpu  # force cpu usage
test.py works exactly the same way. detect.py accepts a --device argument, but is limited to 1 gpu.

would you pls support multi-gpus while using detect.py ?

glenn-jocher · 2020-11-26T15:12:02Z

@liangshi036 we don't have the resources to implement suggestions, but you can do this yourself and submit a PR!

HaxThePlanet added the enhancement New feature or request label Jun 13, 2020

HaxThePlanet closed this as completed Jun 13, 2020

glenn-jocher mentioned this issue Jun 16, 2020

How can I train my data with multy-gpus #85

Closed

jcluo1994 mentioned this issue Oct 10, 2023

Using multi-GPU training reports errors #12213

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple GPU support #48

Multiple GPU support #48

HaxThePlanet commented Jun 13, 2020

github-actions bot commented Jun 13, 2020 •

edited by glenn-jocher

Loading

glenn-jocher commented Jun 13, 2020

HaxThePlanet commented Jun 13, 2020

AIFAN-Lab commented Jun 17, 2020

glenn-jocher commented Jun 17, 2020 •

edited

Loading

AIFAN-Lab commented Jun 17, 2020

HaxThePlanet commented Jun 26, 2020

glenn-jocher commented Jun 27, 2020

liangshi036 commented Nov 26, 2020

glenn-jocher commented Nov 26, 2020

Multiple GPU support #48

Multiple GPU support #48

Comments

HaxThePlanet commented Jun 13, 2020

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

github-actions bot commented Jun 13, 2020 • edited by glenn-jocher Loading

glenn-jocher commented Jun 13, 2020

HaxThePlanet commented Jun 13, 2020

AIFAN-Lab commented Jun 17, 2020

glenn-jocher commented Jun 17, 2020 • edited Loading

AIFAN-Lab commented Jun 17, 2020

HaxThePlanet commented Jun 26, 2020

glenn-jocher commented Jun 27, 2020

liangshi036 commented Nov 26, 2020

glenn-jocher commented Nov 26, 2020

github-actions bot commented Jun 13, 2020 •

edited by glenn-jocher

Loading

glenn-jocher commented Jun 17, 2020 •

edited

Loading