Fix Bug when reporting Recall / Precision / mAP #8686

pourmand1376 · 2022-07-23T08:30:18Z

There is a bug which I fully described in

Impressive Results when changing conf-thres and iou-thres #8669

This PR fixes the bug. I explained in the issue completely. I have added a list called target_cls which keeps track of all labels. This shouldn't be inside for loop which examines out. After that we combine the target_cls into stats again. This is for code consistency and to make sure we have no problem anywhere else.

The core of the problem comes from the fact that we are looping over filtered results, which makes our metrics wrong.

If you want more explanation, do not hesitate to tell me.

Also fixes:

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Enhanced validation metrics logging for better model evaluation.

📊 Key Changes

Added collection of target class IDs during model validation to obtain per-class metrics.
Updated statistics calculation to include the newly collected target class information.
Refined recorded statistics by removing a redundant element (the target class was recorded twice).

🎯 Purpose & Impact

✅ The changes aim to improve the model evaluation process by providing more detailed information on class-wise performance, which can be critical for fine-tuning and understanding model behavior.
🚀 Potential impact includes more informed decision-making for model improvements and clearer insights into how well the model is performing across different classes. This is especially useful when dealing with datasets that have a large number of classes or imbalanced classes.

for more information, see https://pre-commit.ci

…yolov5 into fix_bug_validation

for more information, see https://pre-commit.ci

…yolov5 into fix_bug_validation

for more information, see https://pre-commit.ci

glenn-jocher · 2022-07-23T15:33:44Z

@pourmand1376 thanks for the PR! Have you quantified the effect of this change on COCO mAP?

glenn-jocher · 2022-07-23T15:48:31Z

@pourmand1376 I tested this PR against master with the following two commands and observed the exact same results. Are you sure this PR is changing any results? Can you provide reproducible code that illustrates the effect of the PR please? Thanks!

# Download COCO val
torch.hub.download_url_to_file('https://ultralytics.com/assets/coco2017val.zip', 'tmp.zip')
!unzip -q tmp.zip -d ../datasets && rm tmp.zip

# master
!git clone https://github.com/ultralytics/yolov5  # clone
%cd yolov5
%pip install -qr requirements.txt  # install
!python val.py --weights yolov5x.pt --data coco.yaml --img 640 --iou 0.65 --half --conf 0.001
!python val.py --weights yolov5x.pt --data coco.yaml --img 640 --iou 0.65 --half --conf 0.2

# PR
%cd ..
!git clone https://github.com/pourmand1376/yolov5 -b fix_bug_validation yolov5-pr  # clone
%cd yolov5-pr
!python val.py --weights yolov5x.pt --data coco.yaml --img 640 --iou 0.65 --half --conf 0.001
!python val.py --weights yolov5x.pt --data coco.yaml --img 640 --iou 0.65 --half --conf 0.2

pourmand1376 · 2022-07-23T16:05:13Z

If you test with higher conf-thres like 0.5, you can see my points.

Also, I think you have set the parameters wrong! You should set it according to this

yolov5/val.py

Lines 336 to 337 in 1c5e92a

    
           parser.add_argument('--conf-thres', type=float, default=0.001, help='confidence threshold') 
        
           parser.add_argument('--iou-thres', type=float, default=0.6, help='NMS IoU threshold')

# Download COCO val
torch.hub.download_url_to_file('https://ultralytics.com/assets/coco2017val.zip', 'tmp.zip')
!unzip -q tmp.zip -d ../datasets && rm tmp.zip

# master
!git clone https://github.com/ultralytics/yolov5  # clone
%cd yolov5
%pip install -qr requirements.txt  # install
!python val.py --weights yolov5x.pt --data coco.yaml --img 640 --iou-thres 0.65 --half --conf-thres 0.001
!python val.py --weights yolov5x.pt --data coco.yaml --img 640 --iou-thres 0.65 --half --conf-thres 0.2

# PR
%cd ..
!git clone https://github.com/pourmand1376/yolov5 -b fix_bug_validation yolov5-pr  # clone
%cd yolov5-pr
!python val.py --weights yolov5x.pt --data coco.yaml --img 640 --iou-thres 0.65 --half --conf-thres 0.001
!python val.py --weights yolov5x.pt --data coco.yaml --img 640 --iou-thres 0.65 --half --conf-thres 0.2

I will soon publish coco results ...

glenn-jocher · 2022-07-23T16:31:53Z

@pourmand1376 argparser arguments can be shortened if they have no conflicts, i.e. --conf and --conf-thres are the same. I'll retest at higher --conf, but I doubt if moving to 0.2 has no effect that moving to 0.5 will have one.

glenn-jocher · 2022-07-23T16:47:33Z

@pourmand1376 updated test also shows no change:

Input

# install
!git clone https://github.com/ultralytics/yolov5  # clone
%cd yolov5
%pip install -qr requirements.txt  # install

# Download COCO val
import torch
torch.hub.download_url_to_file('https://ultralytics.com/assets/coco2017val.zip', 'tmp.zip')
!unzip -q tmp.zip -d ../datasets && rm tmp.zip

# master
!python val.py --weights yolov5x.pt --data coco.yaml --img 640 --iou 0.65 --half --conf 0.001
!python val.py --weights yolov5x.pt --data coco.yaml --img 640 --iou 0.65 --half --conf 0.5

# PR
%cd ..
!git clone https://github.com/pourmand1376/yolov5 -b fix_bug_validation yolov5-pr  # clone
%cd yolov5-pr
!python val.py --weights yolov5x.pt --data coco.yaml --img 640 --iou 0.65 --half --conf 0.001
!python val.py --weights yolov5x.pt --data coco.yaml --img 640 --iou 0.65 --half --conf 0.5

Output

Cloning into 'yolov5'...
remote: Enumerating objects: 13039, done.
remote: Counting objects: 100% (214/214), done.
remote: Compressing objects: 100% (101/101), done.
remote: Total 13039 (delta 133), reused 187 (delta 113), pack-reused 12825
Receiving objects: 100% (13039/13039), 12.46 MiB | 11.13 MiB/s, done.
Resolving deltas: 100% (8959/8959), done.
/content/yolov5
     |████████████████████████████████| 596 kB 14.8 MB/s 
100%
780M/780M [00:02<00:00, 308MB/s]
val: data=/content/yolov5/data/coco.yaml, weights=['yolov5x.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.65, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True, dnn=False
YOLOv5 🚀 v6.1-314-g7f7bd6f Python-3.7.13 torch-1.12.0+cu113 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)

Downloading https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5x.pt to yolov5x.pt...
100% 166M/166M [00:08<00:00, 21.4MB/s]

Fusing layers... 
YOLOv5x summary: 444 layers, 86705005 parameters, 0 gradients
Downloading https://ultralytics.com/assets/Arial.ttf to /root/.config/Ultralytics/Arial.ttf...
100% 755k/755k [00:00<00:00, 46.8MB/s]
val: Scanning '/content/datasets/coco/val2017' images and labels...4952 found, 48 missing, 0 empty, 0 corrupt: 100% 5000/5000 [00:00<00:00, 10859.23it/s]
val: New cache created: /content/datasets/coco/val2017.cache
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100% 157/157 [01:10<00:00,  2.23it/s]
                 all       5000      36335      0.743      0.625      0.683      0.504
Speed: 0.1ms pre-process, 4.7ms inference, 1.2ms NMS per image at shape (32, 3, 640, 640)

Evaluating pycocotools mAP... saving runs/val/exp/yolov5x_predictions.json...
loading annotations into memory...
Done (t=0.40s)
creating index...
index created!
Loading and preparing results...
DONE (t=4.81s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=74.53s).
Accumulating evaluation results...
DONE (t=16.56s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.506
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.688
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.549
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.340
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.558
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.651
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.382
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.631
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.684
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.528
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.737
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.833
Results saved to runs/val/exp
val: data=/content/yolov5/data/coco.yaml, weights=['yolov5x.pt'], batch_size=32, imgsz=640, conf_thres=0.5, iou_thres=0.65, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True, dnn=False
WARNING: confidence threshold 0.5 > 0.001 produces invalid results ⚠️
YOLOv5 🚀 v6.1-314-g7f7bd6f Python-3.7.13 torch-1.12.0+cu113 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)

Fusing layers... 
YOLOv5x summary: 444 layers, 86705005 parameters, 0 gradients
val: Scanning '/content/datasets/coco/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupt: 100% 5000/5000 [00:00<?, ?it/s]
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100% 157/157 [00:59<00:00,  2.65it/s]
                 all       5000      36335      0.803      0.582      0.707      0.572
Speed: 0.1ms pre-process, 4.7ms inference, 0.8ms NMS per image at shape (32, 3, 640, 640)

Evaluating pycocotools mAP... saving runs/val/exp2/yolov5x_predictions.json...
loading annotations into memory...
Done (t=0.69s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.26s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=12.74s).
Accumulating evaluation results...
DONE (t=2.22s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.424
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.548
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.466
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.231
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.483
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.586
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.330
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.468
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.472
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.249
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.532
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.655
Results saved to runs/val/exp2
/content
Cloning into 'yolov5-pr'...
remote: Enumerating objects: 14479, done.
remote: Counting objects: 100% (144/144), done.
remote: Compressing objects: 100% (77/77), done.
remote: Total 14479 (delta 89), reused 114 (delta 67), pack-reused 14335
Receiving objects: 100% (14479/14479), 12.68 MiB | 21.46 MiB/s, done.
Resolving deltas: 100% (10050/10050), done.
/content/yolov5-pr
val: data=/content/yolov5-pr/data/coco.yaml, weights=['yolov5x.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.65, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True, dnn=False
YOLOv5 🚀 v6.1-326-g1a956c0 Python-3.7.13 torch-1.12.0+cu113 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)

Downloading https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5x.pt to yolov5x.pt...
100% 166M/166M [00:00<00:00, 350MB/s]

Fusing layers... 
YOLOv5x summary: 444 layers, 86705005 parameters, 0 gradients
val: Scanning '/content/datasets/coco/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupt: 100% 5000/5000 [00:00<?, ?it/s]
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100% 157/157 [01:09<00:00,  2.26it/s]
                 all       5000      36335      0.743      0.625      0.683      0.504
Speed: 0.1ms pre-process, 4.6ms inference, 1.2ms NMS per image at shape (32, 3, 640, 640)

Evaluating pycocotools mAP... saving runs/val/exp/yolov5x_predictions.json...
loading annotations into memory...
Done (t=0.41s)
creating index...
index created!
Loading and preparing results...
DONE (t=5.78s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=74.49s).
Accumulating evaluation results...
DONE (t=14.65s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.506
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.688
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.549
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.340
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.558
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.651
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.382
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.631
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.684
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.528
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.737
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.833
Results saved to runs/val/exp
val: data=/content/yolov5-pr/data/coco.yaml, weights=['yolov5x.pt'], batch_size=32, imgsz=640, conf_thres=0.5, iou_thres=0.65, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True, dnn=False
WARNING: confidence threshold 0.5 > 0.001 produces invalid results ⚠️
YOLOv5 🚀 v6.1-326-g1a956c0 Python-3.7.13 torch-1.12.0+cu113 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)

Fusing layers... 
YOLOv5x summary: 444 layers, 86705005 parameters, 0 gradients
val: Scanning '/content/datasets/coco/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupt: 100% 5000/5000 [00:00<?, ?it/s]
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100% 157/157 [00:59<00:00,  2.65it/s]
                 all       5000      36335      0.803      0.582      0.707      0.572
Speed: 0.1ms pre-process, 4.7ms inference, 0.8ms NMS per image at shape (32, 3, 640, 640)

Evaluating pycocotools mAP... saving runs/val/exp2/yolov5x_predictions.json...
loading annotations into memory...
Done (t=0.68s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.25s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=12.61s).
Accumulating evaluation results...
DONE (t=2.20s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.424
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.548
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.466
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.231
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.483
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.586
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.330
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.468
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.472
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.249
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.532
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.655
Results saved to runs/val/exp2

pourmand1376 · 2022-07-23T16:58:30Z

We can not see this in COCO as it has many classes, however we try, there would be one class that actually goes into for loop and from there the result would be correct. I am talking about datasets which have only one class or maybe two.

Do you know any?

I am thinking about trying COCO dataset with only one label ...

pourmand1376 · 2022-07-23T19:27:03Z

I can confirm that the error can not be reproduced. The problem was that I hadn't updated my forked repo for two month. In the meantime, this problem has been fixed.

Now I think we should inform users to update their repo. Currently, It just compares it to master branch but that's not enough. We should add remote for https://github.com/ultralytics/yolov5 and compare with master of this remote if that doesn't exist.

You can still merge the result if you want, but that doesn't change anything!

glenn-jocher · 2022-07-23T19:31:26Z

@pourmand1376 awesome! Thanks for submitting the PR regardless and thank you for confirming everything is working correctly now.

pourmand1376 and others added 2 commits July 23, 2022 12:55

fix bug

9b53466

[pre-commit.ci] auto fixes from pre-commit.com hooks

c6712a2

for more information, see https://pre-commit.ci

pourmand1376 changed the title ~~Fix Bug when reporting Recall/Precision/mAP~~ Fix Bug when reporting Recall / Precision / mAP Jul 23, 2022

pourmand1376 and others added 9 commits July 23, 2022 13:02

fix bug

78ed6c5

Merge branch 'fix_bug_validation' of https://github.com/pourmand1376/…

03ca75d

…yolov5 into fix_bug_validation

fix bug in stats

0acd785

[pre-commit.ci] auto fixes from pre-commit.com hooks

f90ac9b

for more information, see https://pre-commit.ci

add only target labels

0ec7286

Merge branch 'fix_bug_validation' of https://github.com/pourmand1376/…

c5d95ae

…yolov5 into fix_bug_validation

[pre-commit.ci] auto fixes from pre-commit.com hooks

17aa224

for more information, see https://pre-commit.ci

fix bug

ee2e9f0

add stack instead of cat

b0def6e

pourmand1376 marked this pull request as ready for review July 23, 2022 09:15

pourmand1376 marked this pull request as draft July 23, 2022 09:39

add unsqueeze

9831fdf

pourmand1376 marked this pull request as ready for review July 23, 2022 10:11

pourmand1376 mentioned this pull request Jul 23, 2022

Impressive Results when changing conf-thres and iou-thres #8669

Closed

1 task

Delete added empty lines

1a956c0

pourmand1376 closed this Jul 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Bug when reporting Recall / Precision / mAP #8686

Fix Bug when reporting Recall / Precision / mAP #8686

pourmand1376 commented Jul 23, 2022 •

edited by UltralyticsAssistant

Loading

glenn-jocher commented Jul 23, 2022

glenn-jocher commented Jul 23, 2022

pourmand1376 commented Jul 23, 2022 •

edited

Loading

glenn-jocher commented Jul 23, 2022

glenn-jocher commented Jul 23, 2022 •

edited

Loading

pourmand1376 commented Jul 23, 2022 •

edited

Loading

pourmand1376 commented Jul 23, 2022 •

edited

Loading

glenn-jocher commented Jul 23, 2022

Fix Bug when reporting Recall / Precision / mAP #8686

Fix Bug when reporting Recall / Precision / mAP #8686

Conversation

pourmand1376 commented Jul 23, 2022 • edited by UltralyticsAssistant Loading

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

glenn-jocher commented Jul 23, 2022

glenn-jocher commented Jul 23, 2022

pourmand1376 commented Jul 23, 2022 • edited Loading

glenn-jocher commented Jul 23, 2022

glenn-jocher commented Jul 23, 2022 • edited Loading

Input

Output

pourmand1376 commented Jul 23, 2022 • edited Loading

pourmand1376 commented Jul 23, 2022 • edited Loading

glenn-jocher commented Jul 23, 2022

pourmand1376 commented Jul 23, 2022 •

edited by UltralyticsAssistant

Loading

pourmand1376 commented Jul 23, 2022 •

edited

Loading

glenn-jocher commented Jul 23, 2022 •

edited

Loading

pourmand1376 commented Jul 23, 2022 •

edited

Loading

pourmand1376 commented Jul 23, 2022 •

edited

Loading