Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Custom Dataset Training Support #154

Merged
merged 30 commits into from
Mar 24, 2022
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
f175a24
renamed download-progress-bar as download
samet-akcay Feb 24, 2022
f841f51
added new download functions to init
samet-akcay Feb 24, 2022
12cd8ee
Added Btech data module
samet-akcay Feb 25, 2022
7bc453f
Added btech tests
samet-akcay Feb 25, 2022
3a32443
Move split functions into a util module
samet-akcay Feb 25, 2022
132ceb1
Modified mvtec
samet-akcay Feb 25, 2022
907281f
added btech to get-datamodule
samet-akcay Feb 25, 2022
16de223
fix typo in btech docstring
samet-akcay Feb 25, 2022
c2353db
update docstring
samet-akcay Feb 25, 2022
287c974
cleanedup dataset download utils
samet-akcay Feb 25, 2022
df8b655
Address mypy
samet-akcay Feb 25, 2022
966ad94
modify config files and update readme.md
samet-akcay Feb 25, 2022
97d98fa
Fix dataset path
samet-akcay Feb 25, 2022
1e78a31
Merge branch 'development' into feature/data/btad
samet-akcay Mar 6, 2022
f6cba9a
Resolved merge conflicts
samet-akcay Mar 15, 2022
9513723
Merge branch 'feature/data/btad' of github.com:openvinotoolkit/anomal…
samet-akcay Mar 15, 2022
b71f4d3
WiP: Created make_dataset function
samet-akcay Mar 15, 2022
28f7d3e
Renamed folder dataset into custom
samet-akcay Mar 22, 2022
83c1384
Added custom dataset tests
samet-akcay Mar 22, 2022
09908b0
updated config.yaml file to show custom dataset is available
samet-akcay Mar 22, 2022
215df46
Added custom dataset to get_datamodule
samet-akcay Mar 22, 2022
ee12a7a
Resolve merge conflicts
samet-akcay Mar 22, 2022
cf22594
Address PR comments
samet-akcay Mar 23, 2022
8b827d4
Merge branch 'development' of github.com:openvinotoolkit/anomalib int…
samet-akcay Mar 23, 2022
2d24d16
Merge branch 'development' of github.com:openvinotoolkit/anomalib int…
samet-akcay Mar 23, 2022
6646c3b
fix dataset path
samet-akcay Mar 23, 2022
b3cf100
Debugging the ci
samet-akcay Mar 24, 2022
00e8020
Fixed folder dataset tests
samet-akcay Mar 24, 2022
8e47bd3
Added code quality checks back to the ci
samet-akcay Mar 24, 2022
314b164
Added code coverage back to pre-merge tests
samet-akcay Mar 24, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,33 @@ where the currently available models are:
- [DFKDE](anomalib/models/dfkde)
- [GANomaly](anomalib/models/ganomaly)

### Custom Dataset
It is also possible to train on a custom dataset. To do so, `data` section in `config.yaml` is to be modified as follows:
```yaml
dataset:
name: custom
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should use format here instead of name. For MVTec we also have a format field in addition to name. The way I see it, format determines which dataset class is used under the hood, while name can be anything that identifies the specific dataset that is used.

path: <path/to/custom/dataset>
normal: normal # name of the folder containing normal images.
abnormal: abnormal # name of the folder containing abnormal images.
task: segmentation # classification or segmentation
mask: <path/to/mask/annotations> #optional
extensions: null
split_ratio: 0.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add some comments here to the parameters that may be hard to understand. e.g.
split_ratio: 0.2 # ratio of the normal images that will be used to create a test split

seed: 0
image_size: 256
train_batch_size: 32
test_batch_size: 32
num_workers: 8
transform_config: null
create_validation_set: true
tiling:
apply: false
tile_size: null
stride: null
remove_border_count: 0
use_random_tiling: False
random_tile_count: 16
```
## Inference

Anomalib contains several tools that can be used to perform inference with a trained model. The script in [`tools/inference`](tools/inference.py) contains an example of how the inference tools can be used to generate a prediction for an input image.
Expand Down
3 changes: 2 additions & 1 deletion anomalib/config/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,8 @@ def get_configurable_parameters(
config = update_input_size_config(config)

# Project Configs
project_path = Path(config.project.path) / config.model.name / config.dataset.name / config.dataset.category
category = config.dataset.category if "category" in config.dataset.keys() else ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be a bit more clear if we check the dataset type here, and only add the category to the path if the type is MVTec.

project_path = Path(config.project.path) / config.model.name / config.dataset.name / category
(project_path / "weights").mkdir(parents=True, exist_ok=True)
(project_path / "images").mkdir(parents=True, exist_ok=True)
config.project.path = str(project_path)
Expand Down
20 changes: 19 additions & 1 deletion anomalib/data/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
from pytorch_lightning import LightningDataModule

from .btech import BTechDataModule
from .custom import CustomDataModule
from .inference import InferenceDataset
from .mvtec import MVTecDataModule

Expand Down Expand Up @@ -51,12 +52,29 @@ def get_datamodule(config: Union[DictConfig, ListConfig]) -> LightningDataModule
# TODO: Remove config values. IAAALD-211
root=config.dataset.path,
category=config.dataset.category,
image_size=(config.dataset.image_size[0], config.dataset.image_size[0]),
image_size=(config.dataset.image_size[0], config.dataset.image_size[1]),
train_batch_size=config.dataset.train_batch_size,
test_batch_size=config.dataset.test_batch_size,
num_workers=config.dataset.num_workers,
seed=config.project.seed,
)
elif config.dataset.name.lower() == "custom":
datamodule = CustomDataModule(
root=config.dataset.path,
normal=config.dataset.normal,
abnormal=config.dataset.abnormal,
task=config.dataset.task,
mask_dir=config.dataset.mask,
extensions=config.dataset.extensions,
split_ratio=config.dataset.split_ratio,
seed=config.dataset.seed,
image_size=(config.dataset.image_size[0], config.dataset.image_size[1]),
train_batch_size=config.dataset.train_batch_size,
test_batch_size=config.dataset.test_batch_size,
num_workers=config.dataset.num_workers,
transform_config=config.dataset.transform_config,
create_validation_set=config.dataset.create_validation_set,
)
else:
raise ValueError(
"Unknown dataset! \n"
Expand Down
Loading