Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Loading CSV datasets #1050

Open
wants to merge 22 commits into
base: feature/add-csv-data
Choose a base branch
from

Conversation

shenghann
Copy link

@shenghann shenghann commented Apr 26, 2023

Description

PR stemming from this discussion: #1042
In summary, to add csv file loading functionality to anomalib on top of loading from folders for custom datasets.

The CSV dataset format:

    |---|-------|-----------|---------------|---------------------------------------|-------------|
    |   | split |  label*   | image_path*   | mask_path                             | label_index |
    |---|-------|-----------|---------------|---------------------------------------|-------------|
    | 0 |  test |  abnormal |  filename.png | ground_truth/defect/filename_mask.png | 1           |
    |---|-------|-----------|---------------|---------------------------------------|-------------|
  • Single csv file containing following columns:
    • Required columns*:
      • image_path: Path to image file (to join with root from config.dataset.root)
      • label: normal, abnormal or normal_test
    • Optional columns:
      • split: If split column not defined in CSV, generate train and test splits the based on labels (same as how folder dataset handles this: normal samples = train, all abnormal and normal_test samples = test). Need to check that train split should only have normal. if abnormal found drop and ignore.
      • label_index: function of label where, normal = 0 and abnormal = 1
      • mask_path
  • Defined in single new csv_file config key - path to csv file

Code changes:

  • Add CSVDataset and CSV DataModule
  • Add _prepare_filemeta_from_csv function in path.py using pandas to read CSV contents, ensure file required columns defined in csv file are correctly defined, have the correct extensions.
  • Handling of required columns for DataModule _setup in make_csv_dataset
  • Added support for test csv file during inferencing too - csv loading added in data/utils/image.py and lightning_inference.py

Also addresses #1072

Changes

  • Bug fix (non-breaking change which fixes an issue)
  • Refactor (non-breaking change which refactors the code base)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist

  • My code follows the pre-commit style and check guidelines of this project.
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing tests pass locally with my changes
  • I have added a summary of my changes to the CHANGELOG (not for minor changes, docs and tests).

Copy link
Contributor

@samet-akcay samet-akcay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating this PR. I've got few comments.

src/anomalib/data/csv.py Outdated Show resolved Hide resolved
src/anomalib/data/csv.py Show resolved Hide resolved
src/anomalib/data/csv.py Outdated Show resolved Hide resolved
src/anomalib/data/csv.py Outdated Show resolved Hide resolved
src/anomalib/data/__init__.py Outdated Show resolved Hide resolved
shenghann and others added 2 commits April 27, 2023 22:58
Co-authored-by: Samet Akcay <samet.akcay@intel.com>
Co-authored-by: Samet Akcay <samet.akcay@intel.com>
Copy link
Contributor

@djdameln djdameln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is a nice feature to have. I have a few comments:

src/anomalib/data/csv.py Outdated Show resolved Hide resolved
src/anomalib/data/csv.py Outdated Show resolved Hide resolved
src/anomalib/data/csv.py Show resolved Hide resolved
@samet-akcay samet-akcay removed the T2 label Apr 1, 2024
@samet-akcay samet-akcay changed the base branch from main to feature/add-csv-data-support July 12, 2024 12:36
@samet-akcay samet-akcay changed the base branch from feature/add-csv-data-support to feature/add-csv-data July 13, 2024 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants