Impressive Results when changing `conf-thres` and `iou-thres` #8669

pourmand1376 · 2022-07-21T20:53:36Z

Search before asking

I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

I have actually searched yolov5 and read theses related issues:

Still, with my dataset (which is medical image) I get impressive results when increasing conf-thres. I am not sure if my model is doing good or actually changing conf-thres is basically wrong?

Also, PR curves for each conf-thres is different. Is this normal?
I am worried about the results because of warning that you issue when conf-thres is more than 0.001.

[Default] Conf-thres = 0.001

Conf-thres = 0.5

Conf-thres = 0.7

Conf-thres = 0.8

Conf-thres = 0.85

Conf-thres = 0.86

Conf-thres = 0.88

Conf-thres = 0.9

Additional

No response

The text was updated successfully, but these errors were encountered:

Zephyr69 · 2022-07-21T23:52:20Z

If you mean the conf-thres in val.py, yes it's wrong to change that. The reason is exactly what was said by the warning.

pourmand1376 · 2022-07-22T01:56:21Z

In my specific application, I only care about precision/recall.
Is that also wrong? If yes, why would that be the case?

Also, see this comment which says that changing mAP would produce inaccurate mAP but nothing about other metrics.

pourmand1376 · 2022-07-22T02:52:21Z

As I can see, this is the only place where conf-thres is used in val.py.

yolov5/val.py

Line 217 in 4c1784b

    
           out = non_max_suppression(out, conf_thres, iou_thres, labels=lb, multi_label=True, agnostic=single_cls)

So, this is basically filtering out the results. Results which have lower conf-thres than defined are just ignored. This is fine by any standard.

This is where AP for each class is calculated.

yolov5/val.py

Lines 263 to 271 in 4c1784b

    
           # Compute metrics 
        
           stats = [torch.cat(x, 0).cpu().numpy() for x in zip(*stats)]  # to numpy 
        
           if len(stats) and stats[0].any(): 
        
               tp, fp, p, r, f1, ap, ap_class = ap_per_class(*stats, plot=plots, save_dir=save_dir, names=names) 
        
               ap50, ap = ap[:, 0], ap.mean(1)  # AP@0.5, AP@0.5:0.95 
        
               mp, mr, map50, map = p.mean(), r.mean(), ap50.mean(), ap.mean() 
        
               nt = np.bincount(stats[3].astype(int), minlength=nc)  # number of targets per class 
        
           else: 
        
               nt = torch.zeros(1)

And this is the actual code for calculating average precision.

yolov5/utils/metrics.py

Lines 29 to 93 in 4c1784b

    
           def ap_per_class(tp, conf, pred_cls, target_cls, plot=False, save_dir='.', names=(), eps=1e-16): 
        
               """ Compute the average precision, given the recall and precision curves. 
        
               Source: https://github.com/rafaelpadilla/Object-Detection-Metrics. 
        
               # Arguments 
        
                   tp:  True positives (nparray, nx1 or nx10). 
        
                   conf:  Objectness value from 0-1 (nparray). 
        
                   pred_cls:  Predicted object classes (nparray). 
        
                   target_cls:  True object classes (nparray). 
        
                   plot:  Plot precision-recall curve at mAP@0.5 
        
                   save_dir:  Plot save directory 
        
               # Returns 
        
                   The average precision as computed in py-faster-rcnn. 
        
               """ 
        
               # Sort by objectness 
        
               i = np.argsort(-conf) 
        
               tp, conf, pred_cls = tp[i], conf[i], pred_cls[i] 
        
               # Find unique classes 
        
               unique_classes, nt = np.unique(target_cls, return_counts=True) 
        
               nc = unique_classes.shape[0]  # number of classes, number of detections 
        
               # Create Precision-Recall curve and compute AP for each class 
        
               px, py = np.linspace(0, 1, 1000), []  # for plotting 
        
               ap, p, r = np.zeros((nc, tp.shape[1])), np.zeros((nc, 1000)), np.zeros((nc, 1000)) 
        
               for ci, c in enumerate(unique_classes): 
        
                   i = pred_cls == c 
        
                   n_l = nt[ci]  # number of labels 
        
                   n_p = i.sum()  # number of predictions 
        
                   if n_p == 0 or n_l == 0: 
        
                       continue 
        
                   # Accumulate FPs and TPs 
        
                   fpc = (1 - tp[i]).cumsum(0) 
        
                   tpc = tp[i].cumsum(0) 
        
                   # Recall 
        
                   recall = tpc / (n_l + eps)  # recall curve 
        
                   r[ci] = np.interp(-px, -conf[i], recall[:, 0], left=0)  # negative x, xp because xp decreases 
        
                   # Precision 
        
                   precision = tpc / (tpc + fpc)  # precision curve 
        
                   p[ci] = np.interp(-px, -conf[i], precision[:, 0], left=1)  # p at pr_score 
        
                   # AP from recall-precision curve 
        
                   for j in range(tp.shape[1]): 
        
                       ap[ci, j], mpre, mrec = compute_ap(recall[:, j], precision[:, j]) 
        
                       if plot and j == 0: 
        
                           py.append(np.interp(px, mrec, mpre))  # precision at mAP@0.5 
        
               # Compute F1 (harmonic mean of precision and recall) 
        
               f1 = 2 * p * r / (p + r + eps) 
        
               names = [v for k, v in names.items() if k in unique_classes]  # list: only classes that have data 
        
               names = dict(enumerate(names))  # to dict 
        
               if plot: 
        
                   plot_pr_curve(px, py, ap, Path(save_dir) / 'PR_curve.png', names) 
        
                   plot_mc_curve(px, f1, Path(save_dir) / 'F1_curve.png', names, ylabel='F1') 
        
                   plot_mc_curve(px, p, Path(save_dir) / 'P_curve.png', names, ylabel='Precision') 
        
                   plot_mc_curve(px, r, Path(save_dir) / 'R_curve.png', names, ylabel='Recall') 
        
               i = smooth(f1.mean(0), 0.1).argmax()  # max F1 index 
        
               p, r, f1 = p[:, i], r[:, i], f1[:, i] 
        
               tp = (r * nt).round()  # true positives 
        
               fp = (tp / (p + eps) - tp).round()  # false positives 
        
               return tp, fp, p, r, f1, ap, unique_classes.astype(int)

which calls this function:

yolov5/utils/metrics.py

Lines 96 to 121 in 4c1784b

    
           def compute_ap(recall, precision): 
        
               """ Compute the average precision, given the recall and precision curves 
        
               # Arguments 
        
                   recall:    The recall curve (list) 
        
                   precision: The precision curve (list) 
        
               # Returns 
        
                   Average precision, precision curve, recall curve 
        
               """ 
        
               # Append sentinel values to beginning and end 
        
               mrec = np.concatenate(([0.0], recall, [1.0])) 
        
               mpre = np.concatenate(([1.0], precision, [0.0])) 
        
               # Compute the precision envelope 
        
               mpre = np.flip(np.maximum.accumulate(np.flip(mpre))) 
        
               # Integrate area under curve 
        
               method = 'interp'  # methods: 'continuous', 'interp' 
        
               if method == 'interp': 
        
                   x = np.linspace(0, 1, 101)  # 101-point interp (COCO) 
        
                   ap = np.trapz(np.interp(x, mrec, mpre), x)  # integrate 
        
               else:  # 'continuous' 
        
                   i = np.where(mrec[1:] != mrec[:-1])[0]  # points where x axis (recall) changes 
        
                   ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])  # area under curve 
        
               return ap, mpre, mrec

Just the first stage is affected by conf-thres, so after changing this hyperparameter, the results of all metrics should be OK.

Note that ap_per_class function doesn't depend on any external thresholds. So, PR curve would not combine results for all confidence thresholds but just for one conf-threshold. This is what is see from code.

Am I missing something?

GavinYang5 · 2022-07-22T03:41:16Z

i have the same question

pourmand1376 · 2022-07-22T17:44:05Z

I think this may be a bug. I checked that results for recall, precision and mAP are actually wrong!

Recall should decrease as we increase conf-thres value.

pourmand1376 · 2022-07-23T07:08:48Z

I think the bug is here. In line 217, we are filtering out based on non_max_suppression. Then we enumerate on filtered result and use filtered targets to calculate mAP. This should not be the case. We should always return unfiltered targets.

yolov5/val.py

Lines 214 to 247 in 1c5e92a

    
           targets[:, 2:] *= torch.tensor((width, height, width, height), device=device)  # to pixels 
        
           lb = [targets[targets[:, 0] == i, 1:] for i in range(nb)] if save_hybrid else []  # for autolabelling 
        
           t3 = time_sync() 
        
           out = non_max_suppression(out, conf_thres, iou_thres, labels=lb, multi_label=True, agnostic=single_cls) 
        
           dt[2] += time_sync() - t3 
        
           # Metrics 
        
           for si, pred in enumerate(out): 
        
               labels = targets[targets[:, 0] == si, 1:] 
        
               nl, npr = labels.shape[0], pred.shape[0]  # number of labels, predictions 
        
               path, shape = Path(paths[si]), shapes[si][0] 
        
               correct = torch.zeros(npr, niou, dtype=torch.bool, device=device)  # init 
        
               seen += 1 
        
               if npr == 0: 
        
                   if nl: 
        
                       stats.append((correct, *torch.zeros((2, 0), device=device), labels[:, 0])) 
        
                   continue 
        
               # Predictions 
        
               if single_cls: 
        
                   pred[:, 5] = 0 
        
               predn = pred.clone() 
        
               scale_coords(im[si].shape[1:], predn[:, :4], shape, shapes[si][1])  # native-space pred 
        
               # Evaluate 
        
               if nl: 
        
                   tbox = xywh2xyxy(labels[:, 1:5])  # target boxes 
        
                   scale_coords(im[si].shape[1:], tbox, shape, shapes[si][1])  # native-space labels 
        
                   labelsn = torch.cat((labels[:, 0:1], tbox), 1)  # native-space labels 
        
                   correct = process_batch(predn, labelsn, iouv) 
        
                   if plots: 
        
                       confusion_matrix.process_batch(predn, labelsn) 
        
               stats.append((correct, pred[:, 4], pred[:, 5], labels[:, 0]))  # (correct, conf, pcls, tcls)

I've also printed target_cls.shape when passing the input to ap_per_class function like this:

def ap_per_class(tp, conf, pred_cls, target_cls, plot=False, save_dir='.', names=(), eps=1e-16):
    """ Compute the average precision, given the recall and precision curves.
    Source: https://github.com/rafaelpadilla/Object-Detection-Metrics.
    # Arguments
        tp:  True positives (nparray, nx1 or nx10).
        conf:  Objectness value from 0-1 (nparray).
        pred_cls:  Predicted object classes (nparray).
        target_cls:  True object classes (nparray).
        plot:  Plot precision-recall curve at mAP@0.5
        save_dir:  Plot save directory
    # Returns
        The average precision as computed in py-faster-rcnn.
    """
    print(f"TP Shape: {tp.shape}")
    print(f"Conf Shape : {conf.shape}")
    print(f"predicted_cls: {pred_cls.shape}")
    print(f"target_cls: {target_cls.shape}")

## using conf-thres 0.85
TP Shape: (33, 10)
Conf Shape : (33,)
predicted_cls: (33,)
target_cls: (30,)
#  -------------
# using conf-thres 0.001
TP Shape: (25239, 10)
Conf Shape : (25239,)
predicted_cls: (25239,)
target_cls: (281,)
# -------------
# using conf-thres 0.001
TP Shape: (24777, 10)
Conf Shape : (24777,)
predicted_cls: (24777,)
target_cls: (266,)
# --------------

Even when having same conf-thres as recommended, we end up calculating the result wrong! Target_cls should always be the same, no matter what conf-thres is, this is because number of target labels per class is always the same.

This way, even setting conf-thres to 0.001 gives wrong results.

If you look at labels count column in every threshold above, you understand that this is wrong. Labels count shouldn't change for each run!

I will do a PR soon.

pourmand1376 · 2022-07-23T10:55:25Z

Now with PR #8686, my results are consistent and everything is fine. You can also see that label's count is persistent in each run and it is equal to actual labels that I have.

However, my model is not doing any good, so I can not publish the results! Before that, I was actually happy about the result.

Here's the results after fixing the bug:

Conf-thres = 0.001

Conf-thres = 0.2

Conf-thres = 0.5

Conf-thres = 0.7

Conf-thres = 0.85

Conf-thres = 0.9

pourmand1376 · 2022-07-23T12:51:43Z

We can not just say that changing conf-thres is wrong.

See this paper which has demonstrated the importance of confidence threshold. It is interesting that it has actually done some experiments using Yolov5.

Confidence Score: The Forgotten Dimension of Object Detection Performance Evaluation

Page 15 of the paper:

pourmand1376 · 2022-07-23T19:29:18Z

This PR is not required actually. If you have this problem, you should just update your forked repo to latest version. This has been fixed a little earlier.

kaminetzky · 2022-09-06T18:18:51Z

I was having the same issues before updating my repo. Thank you for your analysis!

glenn-jocher · 2023-11-15T09:22:11Z

@kaminetzky you're welcome! It's great to hear that your issues have been addressed. Feel free to reach out if you need further assistance. Good luck with your work!

pourmand1376 added the question Further information is requested label Jul 21, 2022

pourmand1376 changed the title ~~Impressive Results when changing conf-thres~~ Impressive Results when changing conf-thres (Most probably a bug) Jul 23, 2022

pourmand1376 changed the title ~~Impressive Results when changing conf-thres (Most probably a bug)~~ Impressive Results when changing conf-thres (most probably a bug) Jul 23, 2022

pourmand1376 mentioned this issue Jul 23, 2022

Fix Bug when reporting Recall / Precision / mAP #8686

Closed

pourmand1376 changed the title ~~Impressive Results when changing conf-thres (most probably a bug)~~ Impressive Results when changing conf-thres and iou-thres (most probably a bug) Jul 23, 2022

pourmand1376 mentioned this issue Jul 23, 2022

zero mAp, precision and recall graphs is a straight line at zero? #8608

Closed

1 task

pourmand1376 changed the title ~~Impressive Results when changing conf-thres and iou-thres (most probably a bug)~~ Impressive Results when changing conf-thres and iou-thres Jul 23, 2022

pourmand1376 closed this as completed Jul 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Impressive Results when changing `conf-thres` and `iou-thres` #8669

Impressive Results when changing `conf-thres` and `iou-thres` #8669

pourmand1376 commented Jul 21, 2022 •

edited

Loading

Zephyr69 commented Jul 21, 2022 •

edited

Loading

pourmand1376 commented Jul 22, 2022 •

edited

Loading

pourmand1376 commented Jul 22, 2022 •

edited

Loading

GavinYang5 commented Jul 22, 2022

pourmand1376 commented Jul 22, 2022 •

edited

Loading

pourmand1376 commented Jul 23, 2022 •

edited

Loading

pourmand1376 commented Jul 23, 2022 •

edited

Loading

pourmand1376 commented Jul 23, 2022 •

edited

Loading

pourmand1376 commented Jul 23, 2022

kaminetzky commented Sep 6, 2022

glenn-jocher commented Nov 15, 2023

Impressive Results when changing conf-thres and iou-thres #8669

Impressive Results when changing conf-thres and iou-thres #8669

Comments

pourmand1376 commented Jul 21, 2022 • edited Loading

Search before asking

Question

[Default] Conf-thres = 0.001

Conf-thres = 0.5

Conf-thres = 0.7

Conf-thres = 0.8

Conf-thres = 0.85

Conf-thres = 0.86

Conf-thres = 0.88

Conf-thres = 0.9

Additional

Zephyr69 commented Jul 21, 2022 • edited Loading

pourmand1376 commented Jul 22, 2022 • edited Loading

pourmand1376 commented Jul 22, 2022 • edited Loading

GavinYang5 commented Jul 22, 2022

pourmand1376 commented Jul 22, 2022 • edited Loading

pourmand1376 commented Jul 23, 2022 • edited Loading

pourmand1376 commented Jul 23, 2022 • edited Loading

Conf-thres = 0.001

Conf-thres = 0.2

Conf-thres = 0.5

Conf-thres = 0.7

Conf-thres = 0.85

Conf-thres = 0.9

pourmand1376 commented Jul 23, 2022 • edited Loading

pourmand1376 commented Jul 23, 2022

kaminetzky commented Sep 6, 2022

glenn-jocher commented Nov 15, 2023

Impressive Results when changing `conf-thres` and `iou-thres` #8669

Impressive Results when changing `conf-thres` and `iou-thres` #8669

pourmand1376 commented Jul 21, 2022 •

edited

Loading

Zephyr69 commented Jul 21, 2022 •

edited

Loading

pourmand1376 commented Jul 22, 2022 •

edited

Loading

pourmand1376 commented Jul 22, 2022 •

edited

Loading

pourmand1376 commented Jul 22, 2022 •

edited

Loading

pourmand1376 commented Jul 23, 2022 •

edited

Loading

pourmand1376 commented Jul 23, 2022 •

edited

Loading

pourmand1376 commented Jul 23, 2022 •

edited

Loading