Using different batch size leads to different results when running test.py #819

pprp · 2020-01-31T11:15:40Z

🐛 Bug

@glenn-jocher Thank you for your excellent work, but I encountered a bug.

I only have 39 pictures for testing.

If batch size > total number of pictures, the F1 score is 86.4% and map is 78.2%.

If batch size <total number of pictures, the F1 score is 85% and the map is 77.7%.

To Reproduce

Steps to reproduce the behavior:

batch size = 64

python test.py --cfg cfg/yolov3.cfg --weights  weights/best.pt --batch-size 64

I got:

Using CUDA device0 _CudaDeviceProperties(name='Tesla P100-PCIE-16GB', total_memory=16276MB)
           device1 _CudaDeviceProperties(name='Tesla P100-PCIE-16GB', total_memory=16276MB)

               Class    Images   Targets         P         R   mAP@0.5        F1: 100%|█████████████████████████████████████████████████████████████████████████████| 1/1 [00:11<00:00, 11.50s/it]
                 all        39        65     0.962     0.785     0.782     0.864

batch size = 4

python test.py --cfg cfg/yolov3.cfg --weights  weights/best.pt --batch-size 4

I got:

Using CUDA device0 _CudaDeviceProperties(name='Tesla P100-PCIE-16GB', total_memory=16276MB)
           device1 _CudaDeviceProperties(name='Tesla P100-PCIE-16GB', total_memory=16276MB)

               Class    Images   Targets         P         R   mAP@0.5        F1: 100%|███████████████████████████████████████████████████████████████████████████| 10/10 [00:10<00:00,  1.09s/it]
                 all        39        65     0.927     0.785     0.777      0.85

Do you know where the problem is? Thank you very much!

The text was updated successfully, but these errors were encountered:

glenn-jocher · 2020-01-31T16:49:27Z

@pprp good investigation! This is actually not a bug, it is an effect of the rectangular inference that test.py runs. It orders batches by aspect ratio, grouping images of similar shapes into a batch, and applying the minimum letterbox required to that batch. That means that each of the batches have different letterboxed shapes. See rectangular inference #232

When you do batch-size 64, likely a square letterbox is applied. In any case, the differences should be very minor, as you see. If you had a larger group of test images your results would be even closer I believe.

pprp · 2020-02-01T14:10:51Z

Thank you very much for your explanation！

glenn-jocher · 2023-11-15T08:41:36Z

@pprp you're welcome! If you have any more questions or need further assistance, feel free to ask. Keep up the good work!

pprp added the bug Something isn't working label Jan 31, 2020

pprp closed this as completed Feb 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using different batch size leads to different results when running test.py #819

Using different batch size leads to different results when running test.py #819

pprp commented Jan 31, 2020

glenn-jocher commented Jan 31, 2020

pprp commented Feb 1, 2020

glenn-jocher commented Nov 15, 2023

Using different batch size leads to different results when running test.py #819

Using different batch size leads to different results when running test.py #819

Comments

pprp commented Jan 31, 2020

🐛 Bug

To Reproduce

glenn-jocher commented Jan 31, 2020

pprp commented Feb 1, 2020

glenn-jocher commented Nov 15, 2023