Yolo v3 take a lot of time to train on custom data #1458

FlorianRuen · 2020-08-19T17:13:38Z

❔Question

Hello everyone,

I'm using the code from this repo to train my model on images (around 12k images, labelled using labelbox in correct format), and there is around 17 classes.

I'm training my model on AWS EC2 instance (instance type is g3s.xlarge with Tesla M60 GPU and almost 8 gio video memory), but the training take a lot of time, and it's very hard to find why.

I'm explaining: I'm trying to make 500 epochs, and one epochs take around 25-30 minuts on this kind of instance. On my side, I think it's very long (my model isn't very big to take this time to train)

Hyperparameter was default one, I'm using batch size = 4 (> 4 look like to cause CUDA Out of Memore error), my test size is 20% of my 12k images.

What do you think about this ? Is it normal or to long ? If it's very long, any way to find why ?

Don't hesitate if I miss some data that can help

Kind regards,
Florian

glenn-jocher · 2020-08-19T18:44:33Z

@FlorianRuen Ultralytics has open-sourced YOLOv5 at https://github.com/ultralytics/yolov5, featuring faster, lighter and more accurate object detection. YOLOv5 is recommended for all new projects.

** GPU Speed measures end-to-end time per image averaged over 5000 COCO val2017 images using a V100 GPU with batch size 32, and includes image preprocessing, PyTorch FP16 inference, postprocessing and NMS. EfficientDet data from [google/automl](https://github.com/google/automl) at batch size 8.

August 13, 2020: v3.0 release: nn.Hardswish() activations, data autodownload, native AMP.
July 23, 2020: v2.0 release: improved model definition, training and mAP.
June 22, 2020: PANet updates: new heads, reduced parameters, improved speed and mAP 364fcfd.
June 19, 2020: FP16 as new default for smaller checkpoints and faster inference d4c6674.
June 9, 2020: CSP updates: improved speed, size, and accuracy (credit to @WongKinYiu for CSP).
May 27, 2020: Public release. YOLOv5 models are SOTA among all known YOLO implementations.
April 1, 2020: Start development of future compound-scaled YOLOv3/YOLOv4-based PyTorch models.

Pretrained Checkpoints

Model	AP^val	AP^test	AP₅₀	Speed_GPU	FPS_GPU	params	FLOPS
YOLOv5s	37.0	37.0	56.2	2.4ms	416	7.5M	13.2B
YOLOv5m	44.3	44.3	63.2	3.4ms	294	21.8M	39.4B
YOLOv5l	47.7	47.7	66.5	4.4ms	227	47.8M	88.1B
YOLOv5x	49.2	49.2	67.7	6.9ms	145	89.0M	166.4B

YOLOv5x + TTA	50.8	50.8	68.9	25.5ms	39	89.0M	354.3B

YOLOv3-SPP	45.6	45.5	65.2	4.5ms	222	63.0M	118.0B

** AP^test denotes COCO test-dev2017 server results, all other AP results in the table denote val2017 accuracy.
** All AP numbers are for single-model single-scale without ensemble or test-time augmentation. Reproduce by python test.py --data coco.yaml --img 640 --conf 0.001
** Speed_GPU measures end-to-end time per image averaged over 5000 COCO val2017 images using a GCP n1-standard-16 instance with one V100 GPU, and includes image preprocessing, PyTorch FP16 image inference at --batch-size 32 --img-size 640, postprocessing and NMS. Average NMS time included in this chart is 1-2ms/img. Reproduce by python test.py --data coco.yaml --img 640 --conf 0.1
** All checkpoints are trained to 300 epochs with default settings and hyperparameters (no autoaugmentation).
** Test Time Augmentation (TTA) runs at 3 image sizes. Reproduce by python test.py --data coco.yaml --img 832 --augment

For more information and to get started with YOLOv5 please visit https://github.com/ultralytics/yolov5. Thank you!

FlorianRuen · 2020-08-20T06:45:04Z

Thanks for the link @glenn-jocher, I'm currently running a trainning using Yolo v5 for the same dataset
I will wait 1 or 2 hours to see the speed to training, and I'm coming back to you, to teel you if it's better or not

Thanks for your help

FlorianRuen · 2020-08-20T10:00:51Z

@glenn-jocher To make a quick update on this topic, the training make around 10 epochs in 1h and 10 minutes

glenn-jocher · 2020-08-20T17:43:53Z

@FlorianRuen sure, sounds fine.

FlorianRuen · 2020-08-20T18:32:39Z

@glenn-jocher Do you think the time taked is normal on this kind of machine ? For now, it reach epoch 78 in 9 hours and 48 minutes, so if the time for an epoch is stable, it should take around 40 hours for 300 epochs

Here is the charts from tensorboard (epoch 78 in 9h 48 min) => https://ibb.co/rw43zm1

Maybe I need to take a bigger machine (maybe with 16go video memory) to get it done faster (2x faster if the performances is x2 ?)

Thanks for your help

glenn-jocher · 2020-08-20T19:23:28Z

@FlorianRuen this is not a question for me, just compare to publicly available environments like google colab.

FlorianRuen · 2020-08-21T06:43:37Z

@glenn-jocher I will try to search again, but any results I found run on only 3 epochs for COCO dataset on only 8 or 128 images, so the epochs is very fast in this case (I have around 700 images per epochs on my side, so if we make a comparation with this, on the public results 8 images in 9 seconds should be around 10 minutes for an epoch)

But if we use the results on you page, that said training on full COCO dataset:

Download COCO and run command below. Training times for YOLOv5s/m/l/x are 2/4/6/8 days on a single V100 (multi-GPU times faster). Use the largest --batch-size your GPU allows (batch sizes shown for 16 GB devices).

As COCO as 118k images for training and 5K for validation, my training is very low on just 12k images (even if I use 8 gio GPU instead of 16 gio)

harshdhamecha · 2022-11-08T11:32:37Z

Hey @FlorianRuen , I am facing the same problem with YOLOV3. Did you find any solutions yet?

Thanks

glenn-jocher · 2022-11-08T21:55:40Z

👋 Hello! Thanks for asking about training speed issues. YOLOv5 🚀 can be trained on CPU (slowest), single-GPU, or multi-GPU (fastest). If you would like to increase your training speed some options are:

Increase --batch-size
Reduce --img-size
Reduce model size, i.e. from YOLOv5x -> YOLOv5l -> YOLOv5m -> YOLOv5s
Train with multi-GPU DDP at larger --batch-size
Train on cached data: python train.py --cache (RAM caching) or --cache disk (disk caching)
Train on faster GPUs, i.e.: P100 -> V100 -> A100
Train on free GPU backends with up to 16GB of CUDA memory:

Good luck 🍀 and let us know if you have any other questions!

FlorianRuen added the question Further information is requested label Aug 19, 2020

FlorianRuen closed this as completed Aug 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yolo v3 take a lot of time to train on custom data #1458

Yolo v3 take a lot of time to train on custom data #1458

FlorianRuen commented Aug 19, 2020

glenn-jocher commented Aug 19, 2020 •

edited

Loading

FlorianRuen commented Aug 20, 2020 •

edited

Loading

FlorianRuen commented Aug 20, 2020

glenn-jocher commented Aug 20, 2020

FlorianRuen commented Aug 20, 2020 •

edited

Loading

glenn-jocher commented Aug 20, 2020

FlorianRuen commented Aug 21, 2020 •

edited

Loading

harshdhamecha commented Nov 8, 2022

glenn-jocher commented Nov 8, 2022 •

edited

Loading

Yolo v3 take a lot of time to train on custom data #1458

Yolo v3 take a lot of time to train on custom data #1458

Comments

FlorianRuen commented Aug 19, 2020

❔Question

glenn-jocher commented Aug 19, 2020 • edited Loading

Pretrained Checkpoints

FlorianRuen commented Aug 20, 2020 • edited Loading

FlorianRuen commented Aug 20, 2020

glenn-jocher commented Aug 20, 2020

FlorianRuen commented Aug 20, 2020 • edited Loading

glenn-jocher commented Aug 20, 2020

FlorianRuen commented Aug 21, 2020 • edited Loading

harshdhamecha commented Nov 8, 2022

glenn-jocher commented Nov 8, 2022 • edited Loading

glenn-jocher commented Aug 19, 2020 •

edited

Loading

FlorianRuen commented Aug 20, 2020 •

edited

Loading

FlorianRuen commented Aug 20, 2020 •

edited

Loading

FlorianRuen commented Aug 21, 2020 •

edited

Loading

glenn-jocher commented Nov 8, 2022 •

edited

Loading