PatchCore validation takes long #583

Jia-Baos · 2022-09-25T04:26:32Z

Describe the bug

-when i using the patchcore to training data(MVTec bottle), there appeared some error, just like this---Validation: 0it [00:00, ?it/s], the process can't continue

To Reproduce

Steps to reproduce the behavior:

nothing

Expected behavior

C:\Users\fx50j.conda\envs\anomalib_env\python.exe D:/PythonProject/anomalib/tools/MyTest.py

1.12.0+cpu
None
None
False
0

dataset:
  name: mvtec #options: [mvtec, btech, folder]
  format: mvtec
  path: D:/PythonProject/anomalib/datasets/MVTec
  task: segmentation
  category: bottle
  image_size: 224
  train_batch_size: 32
  test_batch_size: 1
  num_workers: 8
  transform_config:
    train: null
    val: null
  create_validation_set: false
  tiling:
    apply: false
    tile_size: null
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16

model:
  name: patchcore
  backbone: wide_resnet50_2
  pre_trained: true
  layers:
    - layer2
    - layer3
  coreset_sampling_ratio: 0.1
  num_neighbors: 9
  normalization_method: min_max # options: [null, min_max, cdf]

metrics:
  image:
    - F1Score
    - AUROC
  pixel:
    - F1Score
    - AUROC
  threshold:
    image_default: 0
    pixel_default: 0
    adaptive: true

visualization:
  show_images: False # show images on the screen
  save_images: True # save images to the file system
  log_images: True # log images to the available loggers (if any)
  image_save_path: null # path to which images will be saved
  mode: full # options: ["full", "simple"]

project:
  seed: 0
  path: ./results

logging:
  logger: [] # options: [comet, tensorboard, wandb, csv] or combinations.
  log_graph: false # Logs the model graph to respective logger.

optimization:
  export_mode: null # options: onnx, openvino

# PL Trainer Args. Don't add extra parameter here.
trainer:
  accelerator: auto # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
  accumulate_grad_batches: 1
  amp_backend: native
  auto_lr_find: false
  auto_scale_batch_size: false
  auto_select_gpus: false
  benchmark: false
  check_val_every_n_epoch: 1 # Don't validate before extracting features.
  default_root_dir: null
  detect_anomaly: false
  deterministic: false
  devices: 1
  enable_checkpointing: true
  enable_model_summary: true
  enable_progress_bar: true
  fast_dev_run: false
  gpus: null # Set automatically
  gradient_clip_val: 0
  ipus: null
  limit_predict_batches: 1.0
  limit_test_batches: 1.0
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  log_every_n_steps: 50
  log_gpu_memory: null
  max_epochs: 1
  max_steps: -1
  max_time: null
  min_epochs: null
  min_steps: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle
  num_nodes: 1
  num_processes: null
  num_sanity_val_steps: 0
  overfit_batches: 0.0
  plugins: null
  precision: 32
  profiler: null
  reload_dataloaders_every_n_epochs: 0
  replace_sampler_ddp: true
  strategy: null
  sync_batchnorm: false
  tpu_cores: null
  track_grad_norm: -1
  val_check_interval: 1.0 # Don't validate before extracting features.

Transform configs has not been provided. Images will be normalized using ImageNet statistics.
Transform configs has not been provided. Images will be normalized using ImageNet statistics.
C:\Users\fx50j.conda\envs\anomalib_env\lib\site-packages\torch\utils\data\dataloader.py:557: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 4 (cpuset is not taken into account), which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(create_warning_msg(
dict_keys(['image', 'image_path', 'label', 'mask_path', 'mask'])
torch.Size([1, 3, 224, 224])
torch.Size([1, 224, 224])
C:\Users\fx50j.conda\envs\anomalib_env\lib\site-packages\torchmetrics\utilities\prints.py:36: UserWarning: Metric PrecisionRecallCurve will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint.
warnings.warn(*args, **kwargs)
D:\PythonProject\anomalib\anomalib\utils\callbacks_init.py:133: UserWarning: Export option: None not found. Defaulting to no model export
warnings.warn(f"Export option: {config.optimization.export_mode} not found. Defaulting to no model export")
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Trainer(limit_train_batches=1.0) was configured so 100% of the batches per epoch will be used..
Trainer(limit_val_batches=1.0) was configured so 100% of the batches will be used..
Trainer(limit_test_batches=1.0) was configured so 100% of the batches will be used..
Trainer(limit_predict_batches=1.0) was configured so 100% of the batches will be used..
Trainer(val_check_interval=1.0) was configured so validation will run at the end of the training epoch..
Missing logger folder: results\patchcore\mvtec\bottle\lightning_logs
C:\Users\fx50j.conda\envs\anomalib_env\lib\site-packages\torchmetrics\utilities\prints.py:36: UserWarning: Metric ROC will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint.
warnings.warn(*args, **kwargs)
C:\Users\fx50j.conda\envs\anomalib_env\lib\site-packages\pytorch_lightning\core\optimizer.py:183: UserWarning: LightningModule.configure_optimizers returned None, this fit will run with no optimizer
rank_zero_warn(

| Name | Type | Params

0 | image_threshold | AdaptiveThreshold | 0
1 | pixel_threshold | AdaptiveThreshold | 0
2 | model | PatchcoreModel | 24.9 M
3 | image_metrics | AnomalibMetricCollection | 0
4 | pixel_metrics | AnomalibMetricCollection | 0
5 | normalization_metrics | MinMax | 0

24.9 M Trainable params
0 Non-trainable params
24.9 M Total params
99.450 Total estimated model params size (MB)
C:\Users\fx50j.conda\envs\anomalib_env\lib\site-packages\torch\utils\data\dataloader.py:557: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 4 (cpuset is not taken into account), which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
C:\Users\fx50j.conda\envs\anomalib_env\lib\site-packages\pytorch_lightning\trainer\trainer.py:1933: PossibleUserWarning: The number of training batches (7) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
rank_zero_warn(
Epoch 0: 1%| | 1/90 [01:07<1:40:34, 67.80s/it, loss=nan, v_num=0]C:\Users\fx50j.conda\envs\anomalib_env\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py:137: UserWarning: training_step returned None. If this was on purpose, ignore this warning...
self.warning_cache.warn("training_step returned None. If this was on purpose, ignore this warning...")
Epoch 0: 8%|▊ | 7/90 [01:51<22:00, 15.91s/it, loss=nan, v_num=0]
Validation: 0it [00:00, ?it/s]

Screenshots

If applicable, add screenshots to help explain your problem.

Hardware and Software Configuration

OS: [Ubuntu, OD]
NVIDIA Driver Version [470.57.02]
CUDA Version [e.g. 11.4]
CUDNN Version [e.g. v11.4.120]
OpenVINO Version [Optional e.g. v2021.4.2]

Additional context

Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

samet-akcay · 2022-09-26T11:22:32Z

@Jia-Baos, I cannot reproduce this issue. Here is what I get when I run patchcore

Would it be because of your hardware configuration such that validation takes long time?

To double check this, you could change the model to

model:
  name: patchcore
  backbone: resnet18
  pre_trained: true
  layers:
    - layer2
    - layer3
  coreset_sampling_ratio: 0.1
  num_neighbors: 9
  normalization_method: min_max # options: [null, min_max, cdf]

or

model:
  name: patchcore
  backbone: resnet18
  pre_trained: true
  layers:
    - layer3
  coreset_sampling_ratio: 0.1
  num_neighbors: 9
  normalization_method: min_max # options: [null, min_max, cdf]

to make the model more lightweight.

Jia-Baos · 2022-09-26T13:52:21Z

Thank you so much, i have adopted your recommendations and changed the model, you're right, it needs to take a long time to validation..............

samet-akcay · 2022-09-26T13:58:03Z

We have just merged a PR #580, which partially addresses this. See #268 #533.

I'll be converting this to a Q&A in Discussions. Feel free to continue from there. Cheers!

samet-akcay changed the title ~~training error~~ PatchCore validation takes long Sep 26, 2022

openvinotoolkit locked and limited conversation to collaborators Sep 26, 2022

samet-akcay converted this issue into discussion #586 Sep 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

PatchCore validation takes long #583

PatchCore validation takes long #583

Jia-Baos commented Sep 25, 2022 •

edited by samet-akcay

Loading

samet-akcay commented Sep 26, 2022 •

edited

Loading

Jia-Baos commented Sep 26, 2022

samet-akcay commented Sep 26, 2022

This issue was moved to a discussion.

This issue was moved to a discussion.

PatchCore validation takes long #583

PatchCore validation takes long #583

Comments

Jia-Baos commented Sep 25, 2022 • edited by samet-akcay Loading

| Name | Type | Params

0 | image_threshold | AdaptiveThreshold | 0 1 | pixel_threshold | AdaptiveThreshold | 0 2 | model | PatchcoreModel | 24.9 M 3 | image_metrics | AnomalibMetricCollection | 0 4 | pixel_metrics | AnomalibMetricCollection | 0 5 | normalization_metrics | MinMax | 0

samet-akcay commented Sep 26, 2022 • edited Loading

Jia-Baos commented Sep 26, 2022

samet-akcay commented Sep 26, 2022

This issue was moved to a discussion.

Jia-Baos commented Sep 25, 2022 •

edited by samet-akcay

Loading

0 | image_threshold | AdaptiveThreshold | 0
1 | pixel_threshold | AdaptiveThreshold | 0
2 | model | PatchcoreModel | 24.9 M
3 | image_metrics | AnomalibMetricCollection | 0
4 | pixel_metrics | AnomalibMetricCollection | 0
5 | normalization_metrics | MinMax | 0

samet-akcay commented Sep 26, 2022 •

edited

Loading