Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Custom Dataset Training Support #154

Merged
merged 30 commits into from
Mar 24, 2022

Conversation

samet-akcay
Copy link
Contributor

@samet-akcay samet-akcay commented Mar 22, 2022

Description

Changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist

  • My code follows the pre-commit style and check guidelines of this project.
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing tests pass locally with my changes

@samet-akcay samet-akcay changed the title Feature/data/custom dataset Add Custom Dataset Training Support Mar 22, 2022
Copy link
Contributor

@djdameln djdameln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, great addition! I didn't manually test the custom dataset format yet, but I'll do that and will post here if I run into any issues.

README.md Outdated
task: segmentation # classification or segmentation
mask: <path/to/mask/annotations> #optional
extensions: null
split_ratio: 0.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add some comments here to the parameters that may be hard to understand. e.g.
split_ratio: 0.2 # ratio of the normal images that will be used to create a test split

README.md Outdated
It is also possible to train on a custom dataset. To do so, `data` section in `config.yaml` is to be modified as follows:
```yaml
dataset:
name: custom
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should use format here instead of name. For MVTec we also have a format field in addition to name. The way I see it, format determines which dataset class is used under the hood, while name can be anything that identifies the specific dataset that is used.

@@ -177,7 +177,8 @@ def get_configurable_parameters(
config = update_input_size_config(config)

# Project Configs
project_path = Path(config.project.path) / config.model.name / config.dataset.name / config.dataset.category
category = config.dataset.category if "category" in config.dataset.keys() else ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be a bit more clear if we check the dataset type here, and only add the category to the path if the type is MVTec.

return samples


class CustomDataset(Dataset):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the naming. Maybe FolderDataset would be more appropriate? Custom sounds a bit like users can choose their own 'custom' format. But this class represents a dataset that follows a fixed format based on the folder structure of the data.

The dataset expects that mask annotation filenames must be same as the original filename.
To show an example, we therefore need to modify the mask filenames in MVTec dataset.

>>> # Rename MVTec mask annotations so that they are the same as image filanames
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid the example in the docstring might cause some confusion with the users (why use the custom dataset class for MVTec if there is a dataset class specific for mvtec). Maybe we could keep it simple and start the example with the assumption that the user has a folder of normal images and a folder of abnormal images, and explicitly state this at the beginning of the example.

Copy link
Contributor

@djdameln djdameln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@samet-akcay samet-akcay merged commit b03fb32 into development Mar 24, 2022
@samet-akcay samet-akcay deleted the feature/data/custom-dataset branch March 24, 2022 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support training with custom MVTec like dataset but without masks (ground truths)
2 participants