Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Some data used for both train and test when using folder dataset format with random seed #746

Closed
1 task done
yesjuhyeong opened this issue Nov 30, 2022 · 2 comments
Assignees

Comments

@yesjuhyeong
Copy link

Describe the bug

I'm using Anomalib for anomaly detection.
My custom dataset is very small size and it is not split to train/validation(test).
So, I split data with anomalib/data/utils/split.py "split_normal_images_in_train_set" function.

But, after create self.train_data & self.test_data at anomalib/data/folder.py,
Some data exists in self.train_data and self.test_data.

I think the data used for training is used again for validation when seed=0(random seed) condition.
Because self.train_data and self.test_data is created independently.

self.train_data
https://github.com/openvinotoolkit/anomalib/blob/main/anomalib/data/folder.py#L483

self.test_data
https://github.com/openvinotoolkit/anomalib/blob/main/anomalib/data/folder.py#L512

Check this issue please.

Dataset

Folder

Model

PatchCore

Steps to reproduce the behavior

  1. Install Anomalib
  2. Prepare dataset for folder format (recommend to small size dataset)
  3. Create config file for folder format
  4. Debuging \anomalib\anomalib\data\folder.py
  5. Compare self.train_data and self.test_data

OS information

OS information:

  • Ubuntu 20.04
  • Python version: [e.g. 3.8.10]
  • Anomalib version:0.3.3
  • PyTorch version: 1.11.0
  • CUDA/cuDNN version: 11.4
  • GPU models and configuration: GeForce RTX 2080
  • Any other relevant information: I'm using a custom dataset

Expected behavior

I want to know about this issue is real bug or my mistake.
If this issue is a bug, please share the fix plan.

Screenshots

No response

Pip/GitHub

GitHub

What version/branch did you use?

Anomalib version:0.3.3

Configuration YAML

dataset:
  name: private_data
  format: folder
  path: /private_data
  task: segmentation
  category: bottle
  image_size: 224
  train_batch_size: 32
  test_batch_size: 32
  num_workers: 8
  transform_config:
    train: null
    val: null
  create_validation_set: false
  tiling:
    apply: false
    tile_size: null
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16

model:
  name: patchcore
  backbone: resnet18
  pre_trained: true
  layers:
    - layer2
    - layer3
  coreset_sampling_ratio: 0.1
  num_neighbors: 9
  normalization_method: min_max # options: [null, min_max, cdf]

metrics:
  image:
    - F1Score
    - AUROC
  pixel:
    - F1Score
    - AUROC
  threshold:
    method: adaptive #options: [adaptive, manual]
    manual_image: null
    manual_pixel: null

visualization:
  show_images: False # show images on the screen
  save_images: True # save images to the file system
  log_images: True # log images to the available loggers (if any)
  image_save_path: null # path to which images will be saved
  mode: full # options: ["full", "simple"]

project:
  seed: 0
  path: ./results

logging:
  logger: [] # options: [comet, tensorboard, wandb, csv] or combinations.
  log_graph: false # Logs the model graph to respective logger.

optimization:
  export_mode: null # options: onnx, openvino

# PL Trainer Args. Don't add extra parameter here.
trainer:
  accelerator: auto # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
  accumulate_grad_batches: 1
  amp_backend: native
  auto_lr_find: false
  auto_scale_batch_size: false
  auto_select_gpus: false
  benchmark: false
  check_val_every_n_epoch: 1 # Don't validate before extracting features.
  default_root_dir: null
  detect_anomaly: false
  deterministic: false
  devices: 1
  enable_checkpointing: true
  enable_model_summary: true
  enable_progress_bar: true
  fast_dev_run: false
  gpus: null # Set automatically
  gradient_clip_val: 0
  ipus: null
  limit_predict_batches: 1.0
  limit_test_batches: 1.0
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  log_every_n_steps: 50
  log_gpu_memory: null
  max_epochs: 1
  max_steps: -1
  max_time: null
  min_epochs: null
  min_steps: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle
  num_nodes: 1
  num_processes: null
  num_sanity_val_steps: 0
  overfit_batches: 0.0
  plugins: null
  precision: 32
  profiler: null
  reload_dataloaders_every_n_epochs: 0
  replace_sampler_ddp: true
  strategy: null
  sync_batchnorm: false
  tpu_cores: null
  track_grad_norm: -1
  val_check_interval: 1.0 # Don't validate before extracting features.

Logs

Don't need logs for this issue.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@djdameln djdameln self-assigned this Dec 1, 2022
@djdameln
Copy link
Contributor

djdameln commented Dec 1, 2022

Hi, this was a reported bug in version 0.3.3, which was fixed in v0.3.4. Upgrading your installation of Anomalib to v0.3.4 or higher should resolve your issue.

I'm closing this issue as a duplicate but feel free to re-open if your problems persist after upgrading.

@djdameln djdameln closed this as not planned Won't fix, can't repro, duplicate, stale Dec 1, 2022
@yesjuhyeong
Copy link
Author

Thank you @djdameln
I'll upgrade my Anomalib version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants