Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PatchCore validation takes long #583

Closed
Jia-Baos opened this issue Sep 25, 2022 · 3 comments
Closed

PatchCore validation takes long #583

Jia-Baos opened this issue Sep 25, 2022 · 3 comments

Comments

@Jia-Baos
Copy link

Jia-Baos commented Sep 25, 2022

Describe the bug

-when i using the patchcore to training data(MVTec bottle), there appeared some error, just like this---Validation: 0it [00:00, ?it/s], the process can't continue

To Reproduce

Steps to reproduce the behavior:

nothing

Expected behavior

C:\Users\fx50j.conda\envs\anomalib_env\python.exe D:/PythonProject/anomalib/tools/MyTest.py


1.12.0+cpu
None
None
False
0


dataset:
  name: mvtec #options: [mvtec, btech, folder]
  format: mvtec
  path: D:/PythonProject/anomalib/datasets/MVTec
  task: segmentation
  category: bottle
  image_size: 224
  train_batch_size: 32
  test_batch_size: 1
  num_workers: 8
  transform_config:
    train: null
    val: null
  create_validation_set: false
  tiling:
    apply: false
    tile_size: null
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16

model:
  name: patchcore
  backbone: wide_resnet50_2
  pre_trained: true
  layers:
    - layer2
    - layer3
  coreset_sampling_ratio: 0.1
  num_neighbors: 9
  normalization_method: min_max # options: [null, min_max, cdf]

metrics:
  image:
    - F1Score
    - AUROC
  pixel:
    - F1Score
    - AUROC
  threshold:
    image_default: 0
    pixel_default: 0
    adaptive: true

visualization:
  show_images: False # show images on the screen
  save_images: True # save images to the file system
  log_images: True # log images to the available loggers (if any)
  image_save_path: null # path to which images will be saved
  mode: full # options: ["full", "simple"]

project:
  seed: 0
  path: ./results

logging:
  logger: [] # options: [comet, tensorboard, wandb, csv] or combinations.
  log_graph: false # Logs the model graph to respective logger.

optimization:
  export_mode: null # options: onnx, openvino

# PL Trainer Args. Don't add extra parameter here.
trainer:
  accelerator: auto # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
  accumulate_grad_batches: 1
  amp_backend: native
  auto_lr_find: false
  auto_scale_batch_size: false
  auto_select_gpus: false
  benchmark: false
  check_val_every_n_epoch: 1 # Don't validate before extracting features.
  default_root_dir: null
  detect_anomaly: false
  deterministic: false
  devices: 1
  enable_checkpointing: true
  enable_model_summary: true
  enable_progress_bar: true
  fast_dev_run: false
  gpus: null # Set automatically
  gradient_clip_val: 0
  ipus: null
  limit_predict_batches: 1.0
  limit_test_batches: 1.0
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  log_every_n_steps: 50
  log_gpu_memory: null
  max_epochs: 1
  max_steps: -1
  max_time: null
  min_epochs: null
  min_steps: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle
  num_nodes: 1
  num_processes: null
  num_sanity_val_steps: 0
  overfit_batches: 0.0
  plugins: null
  precision: 32
  profiler: null
  reload_dataloaders_every_n_epochs: 0
  replace_sampler_ddp: true
  strategy: null
  sync_batchnorm: false
  tpu_cores: null
  track_grad_norm: -1
  val_check_interval: 1.0 # Don't validate before extracting features.

Transform configs has not been provided. Images will be normalized using ImageNet statistics.
Transform configs has not been provided. Images will be normalized using ImageNet statistics.
C:\Users\fx50j.conda\envs\anomalib_env\lib\site-packages\torch\utils\data\dataloader.py:557: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 4 (cpuset is not taken into account), which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(create_warning_msg(
dict_keys(['image', 'image_path', 'label', 'mask_path', 'mask'])
torch.Size([1, 3, 224, 224])
torch.Size([1, 224, 224])
C:\Users\fx50j.conda\envs\anomalib_env\lib\site-packages\torchmetrics\utilities\prints.py:36: UserWarning: Metric PrecisionRecallCurve will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint.
warnings.warn(*args, **kwargs)
D:\PythonProject\anomalib\anomalib\utils\callbacks_init
.py:133: UserWarning: Export option: None not found. Defaulting to no model export
warnings.warn(f"Export option: {config.optimization.export_mode} not found. Defaulting to no model export")
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Trainer(limit_train_batches=1.0) was configured so 100% of the batches per epoch will be used..
Trainer(limit_val_batches=1.0) was configured so 100% of the batches will be used..
Trainer(limit_test_batches=1.0) was configured so 100% of the batches will be used..
Trainer(limit_predict_batches=1.0) was configured so 100% of the batches will be used..
Trainer(val_check_interval=1.0) was configured so validation will run at the end of the training epoch..
Missing logger folder: results\patchcore\mvtec\bottle\lightning_logs
C:\Users\fx50j.conda\envs\anomalib_env\lib\site-packages\torchmetrics\utilities\prints.py:36: UserWarning: Metric ROC will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint.
warnings.warn(*args, **kwargs)
C:\Users\fx50j.conda\envs\anomalib_env\lib\site-packages\pytorch_lightning\core\optimizer.py:183: UserWarning: LightningModule.configure_optimizers returned None, this fit will run with no optimizer
rank_zero_warn(

| Name | Type | Params

0 | image_threshold | AdaptiveThreshold | 0
1 | pixel_threshold | AdaptiveThreshold | 0
2 | model | PatchcoreModel | 24.9 M
3 | image_metrics | AnomalibMetricCollection | 0
4 | pixel_metrics | AnomalibMetricCollection | 0
5 | normalization_metrics | MinMax | 0

24.9 M Trainable params
0 Non-trainable params
24.9 M Total params
99.450 Total estimated model params size (MB)
C:\Users\fx50j.conda\envs\anomalib_env\lib\site-packages\torch\utils\data\dataloader.py:557: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 4 (cpuset is not taken into account), which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
C:\Users\fx50j.conda\envs\anomalib_env\lib\site-packages\pytorch_lightning\trainer\trainer.py:1933: PossibleUserWarning: The number of training batches (7) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
rank_zero_warn(
Epoch 0: 1%| | 1/90 [01:07<1:40:34, 67.80s/it, loss=nan, v_num=0]C:\Users\fx50j.conda\envs\anomalib_env\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py:137: UserWarning: training_step returned None. If this was on purpose, ignore this warning...
self.warning_cache.warn("training_step returned None. If this was on purpose, ignore this warning...")
Epoch 0: 8%|▊ | 7/90 [01:51<22:00, 15.91s/it, loss=nan, v_num=0]
Validation: 0it [00:00, ?it/s]

Screenshots

  • If applicable, add screenshots to help explain your problem.

Hardware and Software Configuration

  • OS: [Ubuntu, OD]
  • NVIDIA Driver Version [470.57.02]
  • CUDA Version [e.g. 11.4]
  • CUDNN Version [e.g. v11.4.120]
  • OpenVINO Version [Optional e.g. v2021.4.2]

Additional context

  • Add any other context about the problem here.
@samet-akcay
Copy link
Contributor

samet-akcay commented Sep 26, 2022

@Jia-Baos, I cannot reproduce this issue. Here is what I get when I run patchcore

image

Would it be because of your hardware configuration such that validation takes long time?

To double check this, you could change the model to

model:
  name: patchcore
  backbone: resnet18
  pre_trained: true
  layers:
    - layer2
    - layer3
  coreset_sampling_ratio: 0.1
  num_neighbors: 9
  normalization_method: min_max # options: [null, min_max, cdf]

or

model:
  name: patchcore
  backbone: resnet18
  pre_trained: true
  layers:
    - layer3
  coreset_sampling_ratio: 0.1
  num_neighbors: 9
  normalization_method: min_max # options: [null, min_max, cdf]

to make the model more lightweight.

@Jia-Baos
Copy link
Author

Thank you so much, i have adopted your recommendations and changed the model, you're right, it needs to take a long time to validation..............

@samet-akcay samet-akcay changed the title training error PatchCore validation takes long Sep 26, 2022
@samet-akcay
Copy link
Contributor

We have just merged a PR #580, which partially addresses this. See #268 #533.

I'll be converting this to a Q&A in Discussions. Feel free to continue from there. Cheers!

@openvinotoolkit openvinotoolkit locked and limited conversation to collaborators Sep 26, 2022
@samet-akcay samet-akcay converted this issue into discussion #586 Sep 26, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants