Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📝 Add a technical blog post to explain how to run anomalib. #359

Merged
merged 23 commits into from
Jun 14, 2022
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
b8dfd8e
➕ Added anomalib logo to blog images
samet-akcay Apr 27, 2022
7605e03
➕ Added logger images to blog images
samet-akcay Apr 27, 2022
3980136
➕ Added HPO figure to blog images
samet-akcay Apr 27, 2022
efb6963
📝 Added the first draft of the technical blog post.
samet-akcay Apr 27, 2022
b309588
🚚 Move technical blog post to a subdirectory
samet-akcay Apr 28, 2022
2b7f28a
➕Added biz directory
samet-akcay Apr 28, 2022
8e77a0a
🏷️ Rename the subdirectories
samet-akcay Apr 28, 2022
d8694bf
➕Add biz draft
samet-akcay Apr 28, 2022
04254f7
Update README.md
djdameln Apr 28, 2022
d737ae3
Update README.md
djdameln Apr 28, 2022
a47fe99
Add gif of results on toy dataset
ashwinvaidya17 Apr 29, 2022
8b9265b
Minor edits
ashwinvaidya17 Apr 29, 2022
10cfd1e
Update images with new visualization
May 4, 2022
b5f1139
Replace example dataset with hazelnut
ashwinvaidya17 May 11, 2022
5304d24
Update README.md
samet-akcay Jun 6, 2022
be05aaf
Update README.md
mehtas73 Jun 6, 2022
3acc6db
Add gifs
ashwinvaidya17 Jun 9, 2022
3a2ffc9
Add link to toy dataset
Jun 10, 2022
a3b2fcf
Merge branch 'development' into blog/train-custom-dataset-with-anomalib
Jun 10, 2022
ff81288
📢 Address reply comments.
samet-akcay Jun 13, 2022
34e8b3a
Merge branch 'development' of github.com:openvinotoolkit/anomalib int…
samet-akcay Jun 13, 2022
fc0bf5c
Merge branch 'development' of github.com:openvinotoolkit/anomalib int…
samet-akcay Jun 14, 2022
fe43e22
Address PR reviewer comments
samet-akcay Jun 14, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions docs/blog/biz/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Anomalib
samet-akcay marked this conversation as resolved.
Show resolved Hide resolved
- Talk about use cases where anomaly based tasks are useful.
- The value of merits of anomaly based tasks in solving the problem
- Less data on the issues, more normal things are present
- Example use cases in the industry:
- Defect detection, security, health care
- Talk about how that translates to value derived for business
- How is it solved today? What are the alternatives?
- Alternative approaches
- How does anomaly based tasks make solving this problem more valuable
for businesses?
- What role does or can Anomalib play here?
- How can one use Anomalib?
- Summary

## Background

Anomaly detection is the process of identifying anomalous items in a stream of input data. An example of a real-world anomaly detection problem is industrial defect detection, where the aim is to identify anomalous products in the output of a production line. Anomaly detection problems are characterized by two main challenges:

First of all, the anomalous items in the dataset can be extremely scarce. Take for example an industrial process which manufactures an LED part at a 98% yield rate. That means for every 2 anomaly datapoints, there are 98 images where no defects occur. Generating a balanced dataset given the high manufacturing yields of today is an arduous task that could require up to tens of thousands of images to be collected just to have a hundred images of defects.

Second, there is usually no clear definition of the anomalous class. Instead, anything that deviates from the normal class should be considered anomalous. Two given defect types might be very different visually, but in the context of the anomaly detection problem, both fall within the anomalous category. This heteregeneous nature of the anomalous class makes it challenging for machine learning model to learn an implicit representation of abnormality. To make things even worse, not all defect types may be known beforehand. During the deployment of our model we might encounter a new defect type that we have never seen before. How do we teach the model to identify this defect type that we have no examples of and might not even know exists?

AI Researchers over the years have created a subset of Machine Learning Algorithms to address these inherent challenges related to Anomaly Detection datasets. The gist of these algorithms is that they require only good images for the training dataset. Meanwhile, the bad images are used in the validation dataset, aiding in quantifying the accuracy of the model. By learning the normality, Anomalib can detect abnormalities in domains where defects are unknown. Anomalib implements the most recent anomaly detection techniques and is continuously updated with the latest State-of-the-Art algorithms

## Draft

Anomaly detection stands in contrast to a normal classification task which depends on both normal and abnormal images to get a good performance. In addition, classification task requires that both the classes be present in equal proportion. This requirement does not always hold true and is also not preferred in certain scenarios. A good manufacturing unit would produce defect only rarely, datapoints for certain illnesses are rare when compared to those of the healthy, and it is preferable to not have examples of certain security threats when training a model to detect such threats.

Anomaly detection aim to address these challenges by providing algorithms which work well on such imbalanced data, and can be trained in an unsupervised manner so that they can identify unseen anomalies.

There are a few approaches used to detect anomalies.

## TODO


Anomalib provides a comprehensive solution to address these needs. It ships with 7 state-of-the-art algorithms and provides utilities such as hyperparameter optimization to tune them to any businesses' needs. It is modular and can be easily extended to include more algorithms. It supports OpenVINO export and inference so that the trained models can be deployed on Intel hardware.

By ensuring the presence of top algorithms in the library, businesses can be assured that their applications are being solved by the best models. Due to Anomalib's open source nature, and public contributions, customers can expect new features with higher velocity, and be assured that bugs and issues are resolved quickly. It's open source nature also ensures transparency and thus will increase the trust of our customers.

To get started with Anomalib, all one needs to do is clone the GitHub repository or install it directly via pip.

```bash

```
Binary file added docs/blog/biz/images/anomalib-wide-blue.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
245 changes: 245 additions & 0 deletions docs/blog/tec/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
# How to Train Your Custom Dataset with Anomalib
<div align="center">
<img src="docs/blog/../../images/anomalib-wide-blue.png" width="600px">
</div>

## Introducing Anomalib
In early 2022, a group of Intel AI Researchers released a ground-breaking Deep Learning Library named Anomalib. The library provides state-of-the art algorithms for Anomaly Detection on Image datasets. Anomalib serves to be a thorough end-to-end solution, offering many features to achieve the highest accuracy model while also providing inference deployment code with Intel’s OpenVino Toolkit.


## Why a dedicated library for Anomaly Detection?

Anomaly detection is the process of identifying anomalous items in a stream of input data. An example of a real-world anomaly detection problem is industrial defect detection, where the aim is to identify anomalous products in the output of a production line. In this type of application, anomalous items occur much less frequently than normal items, which makes it challenging to collect sufficient representative samples of the anomalous class. On top of that, the anomalous class is not well defined, and can contain a wide range of different defect types. These characteristics make it difficult to solve anomaly detection problems with classical, supervised methods. Instead, anomaly detection algorithms usually rely on unsupervised techniques to learn an implicit representation of normality, the normality model. During inference, new samples are compared against this normality model to determine if they belong to the normal or anomalous category.

Anomalib aims to collect the most recent deep-learning based anomaly detection algorithms with the purpose of providing a tool that makes it easy to benchmark different anomaly detection algorithms on public and custom datasets. Anomalib is continuously updated with the latest State-of-the-Art algorithms, and contains several tools and interfaces that can be used to run experiments and create and test new algorithms.

## How to train a Custom dataset with Anomalib

Anomalib supports a number of datasets in various formats, including the state-of-the-art anomaly detection benchmarks such as MVTec AD and BeanTech. For those who would like to use the library on their custom datasets, anomalib also provides a `FolderDatasetModule` that can load datasets from a folder on a file system. The scope of this post will be to train anomalib models on custom datasets using the `FolderDatasetModule`.
samet-akcay marked this conversation as resolved.
Show resolved Hide resolved

### Step 1: Install Anomalib
#### Option - 1 : PyPI
Anomalib can be installed from PyPI via the following:

```bash
pip install anomalib
```

#### Option - 2: Editable Install
Alternatively, it is also possible to do editable install:
```bash
git clone https://github.com/openvinotoolkit/anomalib.git
cd anomalib
pip install -e .
```

### Step 2: Collect Your Custom Dataset
Anomalib supports multiple image extensions such as `".jpg", ".jpeg", ".png", ".ppm", ".bmp", ".pgm", ".tif", ".tiff", and ".webp"`. A dataset can be collected from images that have any of these extensions.

### Step 3: Format your dataset
Depending on the use-case and collection, custom datasets can have different formats, some of which are listed below:
- A dataset with good and bad images.
- A dataset with good and bad images as well as mask ground-truths for pixel-wise evaluation.
- A dataset with good and bad images that is already split into training and testing sets.

Each of these use-cases is addressed by anomalib's `FolderDataModule`. Let's focus on the first use-case as an example of end-to-end model training and inference. In this post, we will use a toy dataset which you can download from [here](https://openvinotoolkit.github.io/anomalib/_downloads/3f2af1d7748194b18c2177a34c03a2c4/hazelnut_toy.zip). The dataset consists of several folders, each containing a set of images. The _colour_ and the _crack_ folders represent two kinds of defects. We can ignore the _masks_ folder for now.
samet-akcay marked this conversation as resolved.
Show resolved Hide resolved

Load your data to the following directory structure. Anomalib will use all images in the _colour_ folder as part of the validation dataset and then randomly split the good images for training and validation.
```
Hazelnut_toy
├── colour
└── good
```

### Step 4: Modify Config File
A YAML configuration file is necessary to run training for Anomalib. The training configuration parameters are categorized into 5 sections: `dataset`, `model`, `project`, `logging`, `trainer`.

To get Anomalib functionally working with a custom dataset, one only needs to change the `dataset` section of the configuration file.

Below is an example of what the dataset parameters would look like for our `hazelnut_toy` folder specified in [Step 2](#step-2-collect-your-data).

Let's choose [PaDim algorithm](https://arxiv.org/pdf/2011.08785.pdf), copy the sample config and modify the dataset section.

```bash
cp anomalib/models/padim/config.yaml custom_padim.yaml
```

```yaml
# Replace the dataset configs with the following.
dataset:
name: hazelnut
format: folder
path: ./datasets/Hazelnut_toy
normal_dir: good # name of the folder containing normal images.
abnormal_dir: colour # name of the folder containing abnormal images.
task: classification # classification or segmentation
mask: null #optional
normal_test_dir: null # optional
extensions: null
split_ratio: 0.2 # normal images ratio to create a test split
seed: 0
image_size: 256
train_batch_size: 32
test_batch_size: 32
num_workers: 8
transform_config:
train: null
val: null
create_validation_set: true
tiling:
apply: false
tile_size: null
stride: null
remove_border_count: 0
use_random_tiling: False
random_tile_count: 16

model:
name: padim
backbone: resnet18
layer:
- layer1
...
```

### Step 5: Run training
As per the config file, move `Hazelnut_toy` to the datasets section in the main root directory of anomalib, and then run

```bash
python tools/train.py --config custom_padim.yaml
```

### Step 6: Interpret Results
Anomalib will print out results of the trained model on the validation dataset. The printed metrics are dependent on the task mode chosen. The classification example provided in this tutorial prints out two scores: F1 and AUROC. The F1 score is a metric which values both the precision and recall, more information on its calculation can be found in this [blog](https://towardsdatascience.com/understanding-accuracy-recall-precision-f1-scores-and-confusion-matrices-561e0f5e328c).

**Additional Info**

Not only does Anomalib classify whether a part is defected or not, it can also be used to segment the defects as well. To do this, simply add a folder called _mask_ at the same directory level as the _good_ and _colour_ folders. This folder should contain binary images for the defects in the _colour_ folder. Here, the white pixels represent the location of the defect. Populate the mask field in the config file with `mask` and change the task to segmentation to see Anomalib segment defects.
```
Hazelnut_toy
├── colour
│ ├── 00.jpg
│ ├── 01.jpg
│ ...
├── good
│ ├── 00.jpg
│ ├── 01.jpg
└── mask
├── 00.jpg
├── 01.jpg
...
```

Here is an example of the generated results for a toy dataset containing Hazelnut with colour defects.

<div align="center">
<img src="docs/blog/../../images/hazelnut_results.gif">
</div>

## Logging and Experiment Management
While it is delightful to know how good your model performed on your preferred metric, it is even more exciting to see the predicted outputs. Anomalib provides a couple of ways to log and track experiments. These can be used individually or in a combination. As of the current release, you can save images to a local folder, or upload to weights and biases, or TensorBoard.

To select where you would like to save the images, change the `log_images_to` parameter in the `project` section in the config file.

For example, setting the following `log_images_to: ["local"]` will result in saving the images in the results folder as shown in the tree structure below:
```
results
└── padim
└── Hazelnut_toy
├── images
│ ├── colour
│ │ ├── 00.jpg
│ │ ├── 01.jpg
│ │ └── ...
│ └── good
│ ├── 00.jpg
│ ├── 01.jpg
│ └── ...
└── weights
└── model.ckpt
```

### Logging to Tensorboard and/or W&B
To use TensorBoard and/or W&B logger, ensure that the logger parameter is set to `tensorboard`, `wandb` or `[tensorboard, wandb]` in the `logging` section of the config file.

An example configuration for saving to TensorBoard is shown in the figure below. Similarly after setting logger to `wandb` you will see the images on your wandb project dashboard.
```yaml
logging:
log_images_to: [tensorboard]
logger: tensorboard # options: [tensorboard, wandb, csv] or combinations.
```

<div align="center">
<img src="docs/blogs/../../images/logging.gif">
</div>

### Hyper-Parameter Optimization
It is very rare to find a model which works out of the box for a particular dataset. However, fortunately, we support tools which work out of the box to help tune the models in Anomalib to your particular dataset. As of the publication of this blog post, Anomalib supports [weights and biases](https://wandb.ai/) for hyperparameter optimization. To get started have a look at `sweep.yaml` located at `tools/hpo`. It provides a sample of how one can define a hyperparameter sweep.

```yaml
observation_budget: 10
method: bayes
metric:
name: pixel_AUROC
goal: maximize
parameters:
dataset:
category: hazelnut
image_size:
values: [128, 256]
model:
backbone:
values: [resnet18, wide_resnet50_2]
```

The observation_budget informs wandb about the number of experiments to run. The method section defines the kind of method to use for HPO search. For other available methods, have a look at [Weights and Biases](https://docs.wandb.ai/guides/sweeps/quickstart) documentation. The parameters section contains dataset and model parameters. Any parameter defined here overrides the parameter in the original model configuration.

To run a sweep, you can just call,

```
python tools/hpo/wandb_sweep.py --model padim --config ./path_to_config.yaml --sweep_config tools/hpo/sweep.yaml"
```

In case `model_config` is not provided, the script looks at the default config location for that model. Note, you will need to have logged into a wandb account to use HPO search and view the results.

A sample run is visible in the screenshot below.
<div align="center">
<img src="docs/blog/../../images/hpo.gif">
</div>

## Benchmarking
To add to the suit of experiment tracking and optimization, anomalib also includes a benchmarking script for gathering results across different combinations of models, their parameters, and dataset categories. The model performance and throughputs are logged into a csv file that can also serve as a means to track model drift. Optionally, these same results can be logged to Weights and Biases and TensorBoard. A sample configuration file is shown in the screenshot below.

```yaml
seed: 42
compute_openvino: false
hardware:
- cpu
- gpu
writer:
- wandb
- tensorboard
grid_search:
dataset:
category:
- colour
- crack
image_size: [128, 256]
model_name:
- padim
- stfpm
```

This configuration computes the throughput and performance metrics for CPU and GPU for two categories of the toy dataset for Padim and STFPM models. The dataset can be configured in the respective model configuration files. By default, `compute_openvino` is set to False to support instances where OpenVINO requirements are not installed in the environment. Once installed, this flag can be set to True to get throughput on OpenVINO optimized models. The writer parameter is optional and can be set to `writer: []` in case the user only requires a csv file without logging to TensorBoard or Weights and Biases. It is also a good practice to set a value of seed to ensure reproducibility across runs and thus, is set to a non-zero value by default.

Once a configuration is decided, benchmarking can easily be performed by calling

```bash
python tools/benchmarking/benchmark.py --config tools/benchmarking/benchmark_params.yaml
```

A nice feature about the provided benchmarking script is that if the host system has multiple GPUs, the runs are parallelized over all the available GPUs for faster collection of result.

**Call to Action**

The Anomalib repository is actively maintained by some of Intel’s top researchers. Their goal is to provide the AI-community with top of the class performance and accuracy while also providing a great user-experience for developers. Checkout out the repo and install anomalib today!
Binary file added docs/blog/tec/images/anomalib-wide-blue.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/blog/tec/images/hazelnut_results.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/blog/tec/images/hpo.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/blog/tec/images/logging.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.