Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the speed of training my custom data #1196

Closed
alicera opened this issue Oct 23, 2020 · 8 comments
Closed

the speed of training my custom data #1196

alicera opened this issue Oct 23, 2020 · 8 comments
Labels
question Further information is requested Stale

Comments

@alicera
Copy link

alicera commented Oct 23, 2020

❔Question

I try to follow the #475 and https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data
I use the command
python -m torch.distributed.launch --nproc_per_node 4 train.py --batch-size 128 --data coco.yaml --cfg yolov5s.yaml --weights ''

Is it normal speed? or some problem
The log is
Epoch gpu_mem box obj cls total targets img_size
6/299 7.2G 0.08226 0.2537 0 0.3359 2253 640: 100%|█████████████████████| 389/389 [58:10<00:00, 4.97s/it]
Class Images Targets P R mAP@.5 mAP@.5:.95: 100%|███████| 389/389 [1:01:00<00:00, 3.66s/it]
all 4.97e+04 5.03e+06 0.339 0.169 0.106 0.0322

 Epoch   gpu_mem       box       obj       cls     total   targets  img_size
 7/299      7.2G   0.08159    0.2491         0    0.3307       884       640: 100%|█████████████████████| 389/389 [56:26<00:00,  7.13s/it]
           Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█████████| 389/389 [47:50<00:00,  2.80s/it]
             all    4.97e+04    5.03e+06       0.296       0.179       0.112      0.0349

 Epoch   gpu_mem       box       obj       cls     total   targets  img_size
 8/299      7.2G   0.08105    0.2429         0     0.324      2672       640: 100%|█████████████████████| 389/389 [56:50<00:00,  6.04s/it]
           Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|███████| 389/389 [1:38:33<00:00,  3.02s/it]
             all    4.97e+04    5.03e+06       0.348       0.173       0.119      0.0372

 Epoch   gpu_mem       box       obj       cls     total   targets  img_size
 9/299      7.2G   0.08098    0.2434         0    0.3244       405       640: 100%|█████████████████████| 389/389 [57:47<00:00,  6.28s/it]
           Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█████████| 389/389 [27:41<00:00,  2.88s/it]
             all    4.97e+04    5.03e+06       0.328       0.159      0.0994      0.0317

 Epoch   gpu_mem       box       obj       cls     total   targets  img_size
10/299      7.2G   0.08078    0.2418         0    0.3226      3567       640: 100%|█████████████████████| 389/389 [56:52<00:00,  5.90s/it]
           Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|███████| 389/389 [1:02:21<00:00,  3.00s/it]
             all    4.97e+04    5.03e+06       0.358       0.165       0.108       0.035

 Epoch   gpu_mem       box       obj       cls     total   targets  img_size
11/299      7.2G   0.08084    0.2403         0    0.3212      1459       640: 100%|█████████████████████| 389/389 [52:53<00:00,  5.02s/it]
           Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█████████| 389/389 [26:24<00:00,  3.12s/it]
@alicera alicera added the question Further information is requested label Oct 23, 2020
@glenn-jocher
Copy link
Member

@alicera with no details on your hardware there can be no answer to your question.

@glenn-jocher
Copy link
Member

@alicera also I see you are using a custom dataset despite your yaml being called coco.yaml. So you have an unknown dataset with unknown hardware asking people if your training time is correct.

@alicera
Copy link
Author

alicera commented Oct 26, 2020

It is a problem about dataset.
Because I use the 30000 images that I prepare to train,validation and the speed is ok.
But I use the 50000 images that I prepare to train,validation and the speed is very slow than COCO 60000up images

@alicera
Copy link
Author

alicera commented Oct 26, 2020

python test.py --weights yolov5x.pt --data coco.yaml --img 640

Output:

           Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 
             all    2.24e+04    4.81e+06       0.373       0.207       0.148      0.0486

Speed: 9.2/8.6/17.8 ms inference/NMS/total per 640x640 image at batch-size 32

It have no the
COCO mAP with pycocotools... saving detections_val2017__results.json...
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.492 < ---------- baseline mAP
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.676
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.534
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.318
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.541
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.633
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.376
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.616
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.670 < ---------- baseline mAR
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.493
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.723
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.812

Do you know the reason?
https://docs.ultralytics.com/yolov5/tutorials/test_time_augmentation

@glenn-jocher
Copy link
Member

@alicera pycocotools mAP only runs on the COCO dataset.

@dongjuns
Copy link

dongjuns commented Oct 31, 2020

Hi, @alicera
Here is an example in my case.

in road.yaml

# train and val data as 1) directory: path/images/, 2) file: path/images.txt, or 3) list: [path1/images/, path2/images/]
train: ../road/CZ_train,txt
val: ../road/CZ_validation.txt

# number of classes
nc: 4

# class names
names: ['D00', 'D10', 'D20', 'D40']

and just changed only 'nc' in yolov5s.model

# parameters
nc: 4  # number of classes

and training code would be,

python train.py --data data/road.yaml --cfg models/yolov5s.yaml --weights yolov5s.pt --batch-size 16

@glenn-jocher
Copy link
Member

@dongjuns yes this is good advice! We've updated the training commands to make them even simpler. Now you only need to specify your --data and your pretrained --weights.

python train.py --data road.yaml --weights yolov5s.pt --batch-size 16

@github-actions
Copy link
Contributor

github-actions bot commented Dec 1, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale label Dec 1, 2020
@github-actions github-actions bot closed this as completed Dec 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale
Projects
None yet
Development

No branches or pull requests

3 participants