question about AdaptiveThreshold, possible bug? #457

davidebaltieri31 · 2022-07-27T11:23:40Z

I'm running anomalib on a custom dataset (around 20k samples) and I've noticed costant crashes after a few epochs/validation steps (DefaultCPUAllocator: can't allocate memory...) almost independently of the modeI I choose.

It seem to be related to AdaptiveThreshold.
In it PrecisionRecallCurve is used, and after every validation step new data is added to it, old data is never deleted.
Why?
Shouldn't data related to the previous validation steps be deleted?

ORippler · 2022-08-02T07:26:51Z

Since PrecisionRecallCurve works on all data/is computed on the complete data by default in torchmetrics, you will have to hoid all your datapoints in memory. For large datasets, you can look at the binned PR-Curve, implemented for constant-memory requirement by torchmetrics.

davidebaltieri31 · 2022-08-03T11:17:26Z

I think my post wasn't clear and I used the term "step" incorrectly.
I understand that PrecisionRecallCurve works on all data, but why it keep data from previous epochs?
I.e. I run training and validation after each epoch:
Epoch 1 -> validation 1 -> epoch 2 -> validation 2 -> etc.
which are I think the default settings in the repo config files, my question is why does AdaptiveThreshold uses PrecisionRecallCurve in a way that keep data both from validation 1 and 2? shouldn't it use all data from validation 2 only (which is validation on the latest trained model only)?

ORippler · 2022-08-03T11:58:44Z

I think my post wasn't clear and I used the term "step" incorrectly. I understand that PrecisionRecallCurve works on all data, but why it keep data from previous epochs? I.e. I run training and validation after each epoch: Epoch 1 -> validation 1 -> epoch 2 -> validation 2 -> etc. which are I think the default settings in the repo config files, my question is why does AdaptiveThreshold uses PrecisionRecallCurve in a way that keep data both from validation 1 and 2? shouldn't it use all data from validation 2 only (which is validation on the latest trained model only)?

Ahh I see. At a quick glance it seems as if the reset method is missing from AdaptiveThreshold.

Can you try to add:

def reset(self):
    self.precision_recall_curve.reset()

to the class and see if that fixes the bug/problem? Class can be found here. overall class should look as follows:

class AdaptiveThreshold(Metric):
    """Optimal F1 Metric.

    Compute the optimal F1 score at the adaptive threshold, based on the F1 metric of the true labels and the
    predicted anomaly scores.
    """

    def __init__(self, default_value: float = 0.5, **kwargs):
        super().__init__(**kwargs)

        self.precision_recall_curve = PrecisionRecallCurve(num_classes=1)
        self.add_state("value", default=torch.tensor(default_value), persistent=True)  # pylint: disable=not-callable
        self.value = torch.tensor(default_value)  # pylint: disable=not-callable

    # pylint: disable=arguments-differ
    def update(self, preds: torch.Tensor, target: torch.Tensor) -> None:  # type: ignore
        """Update the precision-recall curve metric."""
        self.precision_recall_curve.update(preds, target)

    def compute(self) -> torch.Tensor:
        """Compute the threshold that yields the optimal F1 score.

        Compute the F1 scores while varying the threshold. Store the optimal
        threshold as attribute and return the maximum value of the F1 score.

        Returns:
            Value of the F1 score at the optimal threshold.
        """
        precision: torch.Tensor
        recall: torch.Tensor
        thresholds: torch.Tensor

        precision, recall, thresholds = self.precision_recall_curve.compute()
        f1_score = (2 * precision * recall) / (precision + recall + 1e-10)
        if thresholds.dim() == 0:
            # special case where recall is 1.0 even for the highest threshold.
            # In this case 'thresholds' will be scalar.
            self.value = thresholds
        else:
            self.value = thresholds[torch.argmax(f1_score)]
        return self.value
    # ADDED METHOD
    def reset(self):
        self.precision_recall_curve.reset()

Note: Same will be necessary for OptimalF1 also

davidebaltieri31 · 2022-08-03T13:16:41Z

Hi,
added the reset code to both, but it's not called. I think there are probably other changes that need to be made, probably in anomaly_module or somewhere else in anomalib?

ORippler · 2022-08-04T06:18:27Z

Hi, added the reset code to both, but it's not called. I think there are probably other changes that need to be made, probably in anomaly_module or somewhere else in anomalib?

Did you install anomalib with the -e flag, i.e. via pip install -e?

If so, please provide a MWE so that the bug can be localized.

jpcbertoldo · 2022-08-28T09:29:51Z

Shouldn't this one be closed?

djdameln · 2022-08-29T14:18:00Z

I believe that the original issue is not yet resolved. While the reset method has been added to the AdaptiveThreshold metric class, the method is never called. This means that for models that need to be trained for multiple epochs, the data from the previous epochs is never cleared, which in turn may lead to out of memory errors. It could also lead to inaccurate computation of the adaptive threshold. We will investigate this a bit further and work on a fix.

ashwinvaidya17 mentioned this issue Aug 10, 2022

Add reset methods to metrics #488

Merged

5 tasks

djdameln self-assigned this Aug 29, 2022

djdameln mentioned this issue Aug 30, 2022

Reset adaptive threshold between epochs #527

Merged

12 tasks

samet-akcay closed this as completed in #527 Aug 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about AdaptiveThreshold, possible bug? #457

question about AdaptiveThreshold, possible bug? #457

davidebaltieri31 commented Jul 27, 2022 •

edited

Loading

ORippler commented Aug 2, 2022 •

edited

Loading

davidebaltieri31 commented Aug 3, 2022

ORippler commented Aug 3, 2022 •

edited

Loading

davidebaltieri31 commented Aug 3, 2022

ORippler commented Aug 4, 2022

jpcbertoldo commented Aug 28, 2022

djdameln commented Aug 29, 2022

question about AdaptiveThreshold, possible bug? #457

question about AdaptiveThreshold, possible bug? #457

Comments

davidebaltieri31 commented Jul 27, 2022 • edited Loading

ORippler commented Aug 2, 2022 • edited Loading

davidebaltieri31 commented Aug 3, 2022

ORippler commented Aug 3, 2022 • edited Loading

davidebaltieri31 commented Aug 3, 2022

ORippler commented Aug 4, 2022

jpcbertoldo commented Aug 28, 2022

djdameln commented Aug 29, 2022

davidebaltieri31 commented Jul 27, 2022 •

edited

Loading

ORippler commented Aug 2, 2022 •

edited

Loading

ORippler commented Aug 3, 2022 •

edited

Loading