Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIMO #1726

Open
wants to merge 54 commits into
base: main
Choose a base branch
from
Open

PIMO #1726

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
c3944eb
update
jpcbertoldo Dec 19, 2023
b61c47f
test binclf curves numpy and numba and fixes
jpcbertoldo Dec 20, 2023
e6006b4
correct som docstrings
jpcbertoldo Dec 21, 2023
101d646
torch interface and tests
jpcbertoldo Dec 21, 2023
62d5480
torch interface and tests
jpcbertoldo Dec 21, 2023
0f0b424
constants regrouped in dataclass as class vars
jpcbertoldo Dec 21, 2023
8df211e
result class was unneccesary for per_image_binclf_curve
jpcbertoldo Dec 21, 2023
9e74226
factorize function _get_threshs_minmax_linspace
jpcbertoldo Dec 21, 2023
283d704
small docs fixes
jpcbertoldo Dec 21, 2023
2bc3c06
add pimo numpy version and test
jpcbertoldo Dec 21, 2023
c864b54
move validation
jpcbertoldo Dec 22, 2023
a3d1060
add `shared_fpr_metric` option
jpcbertoldo Dec 22, 2023
bfc287e
add pimo torch functional version and test
jpcbertoldo Dec 22, 2023
60483fc
add torchmetrics interface and test
jpcbertoldo Dec 22, 2023
4bfe3da
renames and put things in init
jpcbertoldo Dec 23, 2023
1a7398b
validate inputs in result objects
jpcbertoldo Dec 23, 2023
403b4ae
result objects to from dict and tests
jpcbertoldo Dec 23, 2023
b12fb86
add save and load methods to result objects and test
jpcbertoldo Dec 23, 2023
b7e5439
refactor validations and minor changes
jpcbertoldo Dec 24, 2023
3808de8
test result objects' properties
jpcbertoldo Dec 24, 2023
dfa8dc3
minor refactors
jpcbertoldo Dec 24, 2023
2cefa2c
add missing docstrings
jpcbertoldo Dec 24, 2023
adc14fd
minore vocabulary fix for consistency
jpcbertoldo Dec 24, 2023
c30c4ea
add per image scores statistics and test it
jpcbertoldo Dec 26, 2023
408fb2b
refactor constants notation
jpcbertoldo Dec 26, 2023
43c6eb2
add stats tests and test it
jpcbertoldo Dec 26, 2023
980c972
change the meaning of AUPIMO.num_thresh
jpcbertoldo Dec 27, 2023
a052f6a
interface to format pairwise test results
jpcbertoldo Dec 28, 2023
dabba4a
improve doc
jpcbertoldo Dec 28, 2023
215847b
add optional `paths` to result objects and some minor fixes and refac…
jpcbertoldo Dec 28, 2023
a6404fc
remove frozen from dataclasses and some done todos
jpcbertoldo Jan 4, 2024
14c97fa
review headers
jpcbertoldo Jan 4, 2024
5bc3b2b
doc modifs
jpcbertoldo Jan 22, 2024
b8e0ddf
refactor `score_less_than_thresh` in `_binclf_one_curve_python`
jpcbertoldo Jan 22, 2024
943c1a7
correct license comments
jpcbertoldo Jan 22, 2024
2e565d1
fix doc
jpcbertoldo Feb 9, 2024
103b6db
numba as extra requirement
jpcbertoldo May 28, 2024
08b85ef
refactor copyrights from jpcbertoldo
jpcbertoldo May 28, 2024
6165768
remove from __future__ import annotations
jpcbertoldo May 28, 2024
fbdf8b6
refactor validations names
jpcbertoldo May 28, 2024
f558904
dedupe file path validation
jpcbertoldo May 28, 2024
581b35b
fix tests
jpcbertoldo May 28, 2024
5837c0d
Add todo
jpcbertoldo May 28, 2024
68a30aa
refactor enums
jpcbertoldo May 28, 2024
d4071ad
only logger.warning
jpcbertoldo May 28, 2024
2f65040
refactor test imports
jpcbertoldo May 28, 2024
0d0863f
refactor docs
jpcbertoldo May 28, 2024
fdd797a
refactor some docs
jpcbertoldo May 29, 2024
012e8e2
correct pre commit errors
jpcbertoldo May 29, 2024
f11b4a9
remove author tag
jpcbertoldo Jun 26, 2024
fa448f2
Merge branch 'main' into pimo3
jpcbertoldo Jun 26, 2024
dd0bd50
Merge branch 'pimo3' of github.com:jpcbertoldo/anomalib into pimo3
jpcbertoldo Jun 26, 2024
c92a6a9
add thrid party program
jpcbertoldo Jun 26, 2024
1cde9ce
Update src/anomalib/metrics/per_image/pimo.py
jpcbertoldo Jul 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,8 @@ test = [
"coverage[toml]",
"tox",
]
full = ["anomalib[core,openvino,loggers,notebooks]"]
extra = ["numba>=0.58.1"]
full = ["anomalib[core,openvino,loggers,notebooks,extra]"]
dev = ["anomalib[full,docs,test]"]

[project.scripts]
Expand Down
8 changes: 8 additions & 0 deletions src/anomalib/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,14 @@
__version__ = "1.2.0dev"


try:
import numba # noqa: F401
except ImportError:
HAS_NUMBA = False
else:
HAS_NUMBA = True
Comment on lines +11 to +16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this would be the best place to define this, as PIMO is the only module that uses Numba

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont personally dont see the difference, it's kind of arbitrary, no?
should i move it there?



class LearningType(str, Enum):
"""Learning type defining how the model learns from the dataset samples."""

Expand Down
15 changes: 13 additions & 2 deletions src/anomalib/data/utils/path.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
# Copyright (C) 2022-2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0


import os
import re
from enum import Enum
Expand Down Expand Up @@ -143,13 +142,20 @@ def contains_non_printable_characters(path: str | Path) -> bool:
return not printable_pattern.match(str(path))


def validate_path(path: str | Path, base_dir: str | Path | None = None, should_exist: bool = True) -> Path:
def validate_path(
path: str | Path,
base_dir: str | Path | None = None,
should_exist: bool = True,
accepted_extensions: tuple[str, ...] | None = None,
) -> Path:
"""Validate the path.

Args:
path (str | Path): Path to validate.
base_dir (str | Path): Base directory to restrict file access.
should_exist (bool): If True, do not raise an exception if the path does not exist.
accepted_extensions (tuple[str, ...] | None): Accepted extensions for the path. An exception is raised if the
path does not have one of the accepted extensions. If None, no check is performed. Defaults to None.

Returns:
Path: Validated path.
Expand Down Expand Up @@ -214,6 +220,11 @@ def validate_path(path: str | Path, base_dir: str | Path | None = None, should_e
msg = f"Read or execute permissions denied for the path: {path}"
raise PermissionError(msg)

# Check if the path has one of the accepted extensions
if accepted_extensions is not None and path.suffix not in accepted_extensions:
msg = f"Path extension is not accepted. Accepted extensions: {accepted_extensions}. Path: {path}"
raise ValueError(msg)

return path


Expand Down
7 changes: 7 additions & 0 deletions src/anomalib/metrics/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
import torchmetrics
from omegaconf import DictConfig, ListConfig

from . import per_image
from .anomaly_score_distribution import AnomalyScoreDistribution
from .aupr import AUPR
from .aupro import AUPRO
Expand All @@ -19,6 +20,7 @@
from .f1_max import F1Max
from .f1_score import F1Score
from .min_max import MinMax
from .per_image import AUPIMO, PIMO, aupimo_scores, pimo_curves
from .precision_recall_curve import BinaryPrecisionRecallCurve
from .pro import PRO
from .threshold import F1AdaptiveThreshold, ManualThreshold
Expand All @@ -35,6 +37,11 @@
"ManualThreshold",
"MinMax",
"PRO",
"per_image",
"pimo_curves",
"aupimo_scores",
"PIMO",
"AUPIMO",
]

logger = logging.getLogger(__name__)
Expand Down
46 changes: 46 additions & 0 deletions src/anomalib/metrics/per_image/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
"""Per-Image Metrics."""

# Original Code
# https://github.com/jpcbertoldo/aupimo
#
# Modified
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

from .binclf_curve import per_image_binclf_curve, per_image_fpr, per_image_tpr
from .binclf_curve_numpy import BinclfAlgorithm, BinclfThreshsChoice
from .pimo import AUPIMO, PIMO, AUPIMOResult, PIMOResult, aupimo_scores, pimo_curves
from .pimo_numpy import PIMOSharedFPRMetric
from .utils import (
compare_models_pairwise_ttest_rel,
compare_models_pairwise_wilcoxon,
format_pairwise_tests_results,
per_image_scores_stats,
)
from .utils_numpy import StatsOutliersPolicy, StatsRepeatedPolicy

__all__ = [
# constants
"BinclfAlgorithm",
"BinclfThreshsChoice",
"StatsOutliersPolicy",
"StatsRepeatedPolicy",
"PIMOSharedFPRMetric",
# result classes
"PIMOResult",
"AUPIMOResult",
# functional interfaces
"per_image_binclf_curve",
"per_image_fpr",
"per_image_tpr",
"pimo_curves",
"aupimo_scores",
# torchmetrics interfaces
"PIMO",
"AUPIMO",
# utils
"compare_models_pairwise_ttest_rel",
"compare_models_pairwise_wilcoxon",
"format_pairwise_tests_results",
"per_image_scores_stats",
]
115 changes: 115 additions & 0 deletions src/anomalib/metrics/per_image/_binclf_curve_numba.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
"""Binary classification matrix curve (NUMBA implementation of low level functions).
jpcbertoldo marked this conversation as resolved.
Show resolved Hide resolved

Details: `.binclf_curve`.
"""

# Original Code
# https://github.com/jpcbertoldo/aupimo
#
# Modified
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

import numba
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still on the fence regarding the numba requirement. While it is a good feature to have, it increases complexity. I would recommend dropping it from Anomalib. Any user who wants to use it can refer to the original repo. Any thoughts @samet-akcay @djdameln ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also a bit hesitant to add the numba requirement. I can see the benefit that it brings, but at the same time it adds an unnecessary dependency and it increases the complexity of the code. Without Numba we could have a pure pytorch implementation of the metric, which would be much cleaner and more in line with the rest of the library.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pytorch-only was slower than the one with numpy (which is slower than numba)
i see this is becoming a long issue, i can just drop numba, but can you confirm this is the decision to make? (just to avoid going back later)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpcbertoldo would it be possible to use the pytorch-only implementation by default, and use the Numba implementation if the user has Numba installed in their environment? This way we don't force casual users to install another dependency, and advanced users who care more about speed can install Numba to speed up the computation. We could log a warning to notify the users that the computation can be made faster by installing Numba.

import numpy as np
from numpy import ndarray


@numba.jit(nopython=True)
def binclf_one_curve_numba(scores: ndarray, gts: ndarray, threshs: ndarray) -> ndarray:
jpcbertoldo marked this conversation as resolved.
Show resolved Hide resolved
"""One binary classification matrix at each threshold (NUMBA implementation).

This does the same as `_binclf_one_curve_python` but with numba using just-in-time compilation.

Note: VALIDATION IS NOT DONE HERE! Make sure to validate the arguments before calling this function.

Args:
scores (ndarray): Anomaly scores (D,).
gts (ndarray): Binary (bool) ground truth of shape (D,).
threshs (ndarray): Sequence of thresholds in ascending order (K,).

Returns:
ndarray: Binary classification matrix curve (K, 2, 2)

Details: `anomalib.metrics.per_image.binclf_curve_numpy.binclf_multiple_curves`.
"""
num_th = len(threshs)

# POSITIVES
scores_pos = scores[gts]
# the sorting is very important for the algorithm to work and the speedup
scores_pos = np.sort(scores_pos)
# start counting with lowest th, so everything is predicted as positive (this variable is updated in the loop)
num_pos = current_count_tp = len(scores_pos)

tps = np.empty((num_th,), dtype=np.int64)

# NEGATIVES
# same thing but for the negative samples
scores_neg = scores[~gts]
scores_neg = np.sort(scores_neg)
num_neg = current_count_fp = len(scores_neg)

fps = np.empty((num_th,), dtype=np.int64)

# it will progressively drop the scores that are below the current th
for thidx, th in enumerate(threshs):
num_drop = 0
num_scores = len(scores_pos)
while num_drop < num_scores and scores_pos[num_drop] < th: # ! scores_pos !
num_drop += 1
# ---
scores_pos = scores_pos[num_drop:]
current_count_tp -= num_drop
tps[thidx] = current_count_tp

# same with the negatives
num_drop = 0
num_scores = len(scores_neg)
while num_drop < num_scores and scores_neg[num_drop] < th: # ! scores_neg !
num_drop += 1
# ---
scores_neg = scores_neg[num_drop:]
current_count_fp -= num_drop
fps[thidx] = current_count_fp

fns = num_pos * np.ones((num_th,), dtype=np.int64) - tps
tns = num_neg * np.ones((num_th,), dtype=np.int64) - fps

# sequence of dimensions is (threshs, true class, predicted class) (see docstring)
return np.stack(
(
np.stack((tns, fps), axis=-1),
np.stack((fns, tps), axis=-1),
),
axis=-1,
).transpose(0, 2, 1)


@numba.jit(nopython=True, parallel=True)
def binclf_multiple_curves_numba(scores_batch: ndarray, gts_batch: ndarray, threshs: ndarray) -> ndarray:
"""Multiple binary classification matrix at each threshold (NUMBA implementation).

This does the same as `_binclf_multiple_curves_python` but with numba,
using parallelization and just-in-time compilation.

Note: VALIDATION IS NOT DONE HERE. Make sure to validate the arguments before calling this function.

Args:
scores_batch (ndarray): Anomaly scores (N, D,).
gts_batch (ndarray): Binary (bool) ground truth of shape (N, D,).
threshs (ndarray): Sequence of thresholds in ascending order (K,).

Returns:
ndarray: Binary classification matrix curves (N, K, 2, 2)

Details: `anomalib.metrics.per_image.binclf_curve_numpy.binclf_multiple_curves`.
"""
num_imgs = scores_batch.shape[0]
num_th = len(threshs)
ret = np.empty((num_imgs, num_th, 2, 2), dtype=np.int64)
for imgidx in numba.prange(num_imgs):
scoremap = scores_batch[imgidx]
mask = gts_batch[imgidx]
ret[imgidx] = binclf_one_curve_numba(scoremap, mask, threshs)
return ret
Loading