[Help needed] Test-time augmentation (TTA) #503

mingxingtan · 2020-06-12T05:29:22Z

Anyone interested in helping implement test-time augmentation (multi-scale testing, flipping)?

This is a followup to #491: seems like TTA is an easy way to boost AP (although I don't know how much helpful it is in real products). If you are interested, feel free to assign it to yourself!

glenn-jocher · 2020-07-05T18:14:52Z

@mingxingtan was reviewing your repo and saw this. I might be able to help. I designed Ultralytics TTA strategy for yolov3 (and now https://github.com/ultralytics/yolov5) to increase mAP while minimizing the added extra FLOPS. For yolov5x we see a mAP increase from 48.4 to 50.0 after applying TTA, with about a 2-3X slowdown I think. I'll try to run the numbers and post here.

glenn-jocher · 2020-07-05T18:46:12Z

Ok, here are the results. I've created a documentation issue on our yolov5 repo to help everyone understand also:
ultralytics/yolov5#303

Before You Start

Clone YOLOv5 repo and install requirements.txt dependencies, including Python>=3.7 and PyTorch>=1.5.

git clone https://github.com/ultralytics/yolov5 # clone repo
cd yolov5
pip install -r requirements.txt # install requirements.txt

Test Normally

This command tests YOLOv5x on COCO val2017 at image size 672 pixels. yolov5x.pt is the largest and most accurate model available. Other options are yolov5s.pt, yolov5m.pt and yolov5l.pt, or you own checkpoint from training a custom dataset ./weights/best.pt. For details on all available models please see our README table.

python test.py --weights yolov5x.pt --data coco.yaml --img 672

Output:

Namespace(augment=False, batch_size=32, conf_thres=0.001, data='./data/coco.yaml', device='', img_size=672, iou_thres=0.65, merge=False, save_json=True, single_cls=False, task='val', verbose=False, weights='yolov5x.pt')
Using CUDA device0 _CudaDeviceProperties(name='Tesla P100-PCIE-16GB', total_memory=16280MB)

Model Summary: 407 layers, 8.89652e+07 parameters, 8.89652e+07 gradients
Fusing layers...
Model Summary: 284 layers, 8.89222e+07 parameters, 8.89222e+07 gradients
Caching labels ../coco/labels/val2017.npy (4952 found, 0 missing, 48 empty, 0 duplicate, for 5000 images): 100% 5000/5000 [00:00<00:00, 13153.65it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100% 157/157 [03:04<00:00,  1.17s/it]
                 all       5e+03    3.63e+04       0.426       0.746        0.66       0.469
Speed: 22.9/2.1/25.0 ms inference/NMS/total per 672x672 image at batch-size 32

COCO mAP with pycocotools... saving detections_val2017_yolov5x_results.json...
loading annotations into memory...
Done (t=0.40s)
creating index...
index created!
Loading and preparing results...
DONE (t=4.08s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=90.56s).
Accumulating evaluation results...
DONE (t=11.87s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.484
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.668
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.528
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.311
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.535
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.628
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.371
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.609
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.663
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.500
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.715
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.807

Test with TTA

Append --augment to any existing test.py command to enable TTA, and increases the image size by about 30% for improved results. Note that inference with TTA enabled will typically take about 3X the time of normal inference as the images are being left-right flipped and processed at 3 different resolutions, with the outputs merged before NMS.

python test.py --weights yolov5x.pt --data coco.yaml --img 832 --augment

Output:

Namespace(augment=True, batch_size=32, conf_thres=0.001, data='./data/coco.yaml', device='', img_size=832, iou_thres=0.65, merge=False, save_json=True, single_cls=False, task='val', verbose=False, weights='yolov5x.pt')
Using CUDA device0 _CudaDeviceProperties(name='Tesla P100-PCIE-16GB', total_memory=16280MB)

Model Summary: 407 layers, 8.89652e+07 parameters, 8.89652e+07 gradients
Fusing layers...
Model Summary: 284 layers, 8.89222e+07 parameters, 8.89222e+07 gradients
Caching labels ../coco/labels/val2017.npy (4952 found, 0 missing, 48 empty, 0 duplicate, for 5000 images): 100% 5000/5000 [00:00<00:00, 14939.35it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100% 157/157 [07:48<00:00,  2.98s/it]
                 all       5e+03    3.63e+04       0.313       0.794       0.671       0.483
Speed: 77.5/3.2/80.7 ms inference/NMS/total per 832x832 image at batch-size 32  < ---------- slower

COCO mAP with pycocotools... saving detections_val2017_yolov5x_results.json...
loading annotations into memory...
Done (t=0.43s)
creating index...
index created!
Loading and preparing results...
DONE (t=6.01s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=110.23s).
Accumulating evaluation results...
DONE (t=14.43s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.500  < ---------- increased AP
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.678
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.546
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.336
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.545
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.644
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.381
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.628
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.689  < ---------- increased AR
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.534
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.734
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.826

mingxingtan · 2020-07-06T05:41:38Z

@glenn-jocher Thanks for the information!

This is very nice: +1.2AP with about 3x slower. Could you share the main idea of this fast TTS? (Or is there any doc for it)?

glenn-jocher · 2020-07-06T19:51:10Z

@mingxingtan yes, +1.6AP actually :)

Honestly though, single-model inference improvements will probably always be better than TTA in terms of the +mAP per time or FLOP, but yes it is one final step that you can take to boost your single-model results. For EfficientDet it may boost it above 55.0 (!), but efficientdet may also benefit less than yolov5 because it already has 5 output maps rather than the 3 in yolov5, so it is already exploiting a wider range of multi-scale features than yolov5. It will see the same improvement from left-right flips though.

I don't have any documentation beyond the tutorials in the yolov5 repo. Lots of people have been asking for an arxiv paper, but I simply have not had time. I am aiming to get a paper out by the end of the year after some more experiments.

Maybe we could do a video call to discuss? You can email me or send a Google Calendar invite to glenn.jocher@ultralytics.com. I'm on California time.

mingxingtan · 2021-05-12T07:01:34Z

obsolete issues.

mingxingtan added the help wanted Extra attention is needed label Jun 12, 2020

mingxingtan closed this as completed May 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Help needed] Test-time augmentation (TTA) #503

[Help needed] Test-time augmentation (TTA) #503

mingxingtan commented Jun 12, 2020 •

edited

Loading

glenn-jocher commented Jul 5, 2020 •

edited

Loading

glenn-jocher commented Jul 5, 2020 •

edited

Loading

mingxingtan commented Jul 6, 2020

glenn-jocher commented Jul 6, 2020

mingxingtan commented May 12, 2021

[Help needed] Test-time augmentation (TTA) #503

[Help needed] Test-time augmentation (TTA) #503

Comments

mingxingtan commented Jun 12, 2020 • edited Loading

glenn-jocher commented Jul 5, 2020 • edited Loading

glenn-jocher commented Jul 5, 2020 • edited Loading

Before You Start

Test Normally

Test with TTA

mingxingtan commented Jul 6, 2020

glenn-jocher commented Jul 6, 2020

mingxingtan commented May 12, 2021

mingxingtan commented Jun 12, 2020 •

edited

Loading

glenn-jocher commented Jul 5, 2020 •

edited

Loading

glenn-jocher commented Jul 5, 2020 •

edited

Loading