Diverged metrics in PatchCore: dropping from 0.99 to 0.44 and 0.03 is rather critical? #74

samet-akcay · 2022-01-17T05:22:22Z

diverged metrics: dropping from 0.99 to 0.44 and 0.03 is rather critical?
log images: nice :)
Also padim dropped in performance, but not as crazy.
Here's a patchcore result of "good" parts:

this is padim:
DATALOADER:0 TEST RESULTS
{'image_AUROC': 0.7589669823646545,
'image_F1': 0.8787878751754761,
'pixel_AUROC': 0.9781586527824402,
'pixel_F1': 0.22379672527313232}

Originally posted by @sequoiagrove in #67 (comment)

samet-akcay · 2022-01-17T05:56:02Z

@sequoiagrove, @blakshma worked on the patchcore results, where he managed to improve the performance to the following:

DATALOADER:0 TEST RESULTS
{'image_AUROC': 0.9524492025375366,
 'image_F1': 0.9551020860671997,
 'pixel_AUROC': 0.9894225597381592,
 'pixel_F1': 0.3562552034854889}

To reproduce the numbers you could use this branch, to be merged to development soon after this PR.

Here are the qualitative results for the ones you shared above. (screw/test/good/009/png)

dk-teknologisk-mlnn · 2022-01-17T09:20:14Z

I get those numbers too now. but I wonder of the computation of the numbers is also wrong, because when I look through the results images, it only detects about 1/4 to 1/2 of the defects in the different categories. The pixel F1 = 0.35 seems to decribe best the actual performance. but I know that it is low due to correctly detected defects dont have to overlap perfectly to still be a good result.

dk-teknologisk-mlnn · 2022-01-17T09:30:47Z

Like how can AUC/F1 be 1.0 when it finds detects in only 3 out of 5 bad examples?

DATALOADER:0 TEST RESULTS
{'image_AUROC': 1.0,
'image_F1': 1.0,
'pixel_AUROC': 0.967517614364624,
'pixel_F1': 0.6376267075538635}

blakshma · 2022-01-17T14:32:42Z

@sequoiagrove unfortunately, the classification results are independent of the segmentation results. Hence, the algo might have very good classification result while the segmentation results are poor in some cases as you have pointed out. We will investigate into this.

dk-teknologisk-mlnn · 2022-01-18T08:50:07Z

pixel auroc of 0.97 sound like it is good performance, but looking at the masks it is really not useful in a real system, and the auc is a bad metric for quantifying performance.
So I guess I should always look at pixel F1. Think thats about the same as dice, right? I use dice in segmentation tasks.
Too bad this list shows only the unrealistically positive auroc numbers:
https://paperswithcode.com/sota/anomaly-detection-on-mvtec-ad

samet-akcay · 2022-01-18T09:55:10Z

yeah, AUC is widely used in academia, but usually not a good metric for industrial applications since it could be misleading. Finding the best threshold from the AUC is not so easy even though we implemented adaptive thresholding mechanism. This is the reason why we also added f1 score to our evaluations.

Regarding always looking at the pixel f1 score for evaluation, there is room there for improvement. We haven't optimise our heatmaps from which the predicted masks are generated. Once we do, I agree, pixel f1 would become the standard metric to evaluate the performance.

samet-akcay mentioned this issue Jan 17, 2022

Updated coreset subsampling method to improve accuracy #73

Merged

samet-akcay closed this as completed in #73 Jan 18, 2022

blakshma mentioned this issue Jan 18, 2022

Fixed issue with k_greedy method #80

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diverged metrics in PatchCore: dropping from 0.99 to 0.44 and 0.03 is rather critical? #74

Diverged metrics in PatchCore: dropping from 0.99 to 0.44 and 0.03 is rather critical? #74

samet-akcay commented Jan 17, 2022

samet-akcay commented Jan 17, 2022

dk-teknologisk-mlnn commented Jan 17, 2022

dk-teknologisk-mlnn commented Jan 17, 2022 •

edited

Loading

blakshma commented Jan 17, 2022

dk-teknologisk-mlnn commented Jan 18, 2022

samet-akcay commented Jan 18, 2022

Diverged metrics in PatchCore: dropping from 0.99 to 0.44 and 0.03 is rather critical? #74

Diverged metrics in PatchCore: dropping from 0.99 to 0.44 and 0.03 is rather critical? #74

Comments

samet-akcay commented Jan 17, 2022

samet-akcay commented Jan 17, 2022

dk-teknologisk-mlnn commented Jan 17, 2022

dk-teknologisk-mlnn commented Jan 17, 2022 • edited Loading

blakshma commented Jan 17, 2022

dk-teknologisk-mlnn commented Jan 18, 2022

samet-akcay commented Jan 18, 2022

dk-teknologisk-mlnn commented Jan 17, 2022 •

edited

Loading