Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patchcore multiple test batch size is not supported. #268

Closed
haobo827 opened this issue Apr 23, 2022 · 8 comments
Closed

Patchcore multiple test batch size is not supported. #268

haobo827 opened this issue Apr 23, 2022 · 8 comments
Assignees
Labels
Enhancement New feature or request Metrics Metric Component.

Comments

@haobo827
Copy link

haobo827 commented Apr 23, 2022

I have another problem after dealing with #243
That is:
ValueError: Either preds and target both should have the (same) shape (N, ...), or target should be (N, ...) and preds should be (N, C, ...).
Epoch 0: 100%|██████████| 34/34 [09:17<00:00, 16.39s/it, loss=nan]

From:
File "/home/devadmin/haobo/anomalib_venv/lib/python3.8/site-packages/torchmetrics/utilities/checks.py", line 269, in _check_classification_inputs
case, implied_classes = _check_shape_and_type_consistency(preds, target)
File "/home/devadmin/haobo/anomalib_venv/lib/python3.8/site-packages/torchmetrics/utilities/checks.py", line 115, in
_check_shape_and_type_consistency

Then I print preds and target:
Epoch 0: 68%|████████████████Aggregating the embedding extracted from the training set. 2.13it/s, loss=nan]
Creating CoreSet Sampler via k-Center Greedy
Getting the coreset from the main embedding.
Assigning the coreset as the memory bank.
Epoch 0: 100%|█████████████████████████████████████████████████████| 34/34 [08:59<00:00, 15.85s/it, loss=nan]
preds is: tensor([1.4457])00%|███████████████████████████████████████████████| 11/11 [08:48<00:00, 48.02s/it]
target is: tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], dtype=torch.int32)

my patchcore config.yaml is:

dataset:
  name: wafer_line #options: [mvtec, btech, folder]
  format: folder
  path: ../data/wafer_line/
  normal_dir: "train/Negative"
  abnormal_dir: "test/Positive"
  normal_test_dir: "test/Negative"
  task: segmentation
  mask: ../data/wafer_line/ground_truth/Positive
  extensions: ".jpg"
  split_ratio: 0.1
  seed: 0
  image_size: 256
  train_batch_size: 16
  test_batch_size: 16
  num_workers: 20
  transform_config:
    train: null
    val: null
  create_validation_set: false
  tiling:
    apply: false
    tile_size: null
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16
trainer:
  accelerator: "gpu" # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
  accumulate_grad_batches: 1
  amp_backend: native
  auto_lr_find: false
  auto_scale_batch_size: false
  auto_select_gpus: true
  benchmark: false
  check_val_every_n_epoch: 1 # Don't validate before extracting features.
  default_root_dir: null
  detect_anomaly: false
  deterministic: false
  enable_checkpointing: true
  enable_model_summary: true
  enable_progress_bar: true
  fast_dev_run: false
  gpus: null # Set automatically
  gradient_clip_val: 0
  ipus: null
  limit_predict_batches: 1.0
  limit_test_batches: 1.0
  limit_train_batches: 0.05
  limit_val_batches: 1.0
  log_every_n_steps: 50
  log_gpu_memory: null
  max_epochs: 10000
  max_steps: -1
  max_time: null
  min_epochs: null
  min_steps: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle
  num_nodes: 1
  num_processes: 1
  num_sanity_val_steps: 0
  overfit_batches: 0.0
  plugins: null
  precision: 32
  profiler: null
  reload_dataloaders_every_n_epochs: 0
  replace_sampler_ddp: true
  strategy: null
  sync_batchnorm: false
  tpu_cores: null
  track_grad_norm: -1
  val_check_interval: 1.0 # Don't validate before extracting features.

Thank you for your patience in reading and answering!

@haobo827
Copy link
Author

should I set test_batch_size=1?

@samet-akcay
Copy link
Contributor

Can you share the tree structure of your dataset please?

@haobo827
Copy link
Author

Can you share the tree structure of your dataset please?

data
-wafer_line
--ground_truth
---Positive
--test
---Positive
---Negative
--train
---Negative

And I find too much time for "corest sampling"
beacuse: it's calculating minimum distance using cpu, it's about one and half hour on my cpu, and
lower coreset_sampling_ratio will reduce computation time, but it also reduce performance? right?

I want to know if there is a good way to solve this cpu calculation problem.(i use Tesla V100S-PCIE-32GB )

@samet-akcay
Copy link
Contributor

Can you set accelerator: "gpu" # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto"> to auto? It shouldn't train on CPU. If it does, there is something wrong

@samet-akcay
Copy link
Contributor

should I set test_batch_size=1?

Yes, if you set test_batch_size: 1, it would work. We'll investigate why it doesn't work for multiple batch sizes.

@samet-akcay samet-akcay changed the title ValueError: Either preds and target both should have the (same) shape (N, ...), or target should be (N, ...) and preds should be (N, C, ...) Multiple test batch size is not supported. Apr 23, 2022
@samet-akcay samet-akcay added Bug Something isn't working Metrics Metric Component. labels Apr 23, 2022
@haobo827
Copy link
Author

should I set test_batch_size=1?

Yes, if you set test_batch_size: 1, it would work. We'll investigate why it doesn't work for multiple batch sizes.

You are right. Much appreciate!

@samet-akcay samet-akcay added Enhancement New feature or request and removed Bug Something isn't working labels Apr 24, 2022
@samet-akcay samet-akcay changed the title Multiple test batch size is not supported. Patchcore multiple test batch size is not supported. Jul 12, 2022
@fujikosu
Copy link

Hi @samet-akcay , thanks for developing this amazing library!

We'll investigate why it doesn't work for multiple batch sizes.

Is any investigation done since then? It'd be great if you could share any info around here if any. I'm running inference over about 18000 images. Although there is still plenty of GPU memory available during inference, due to batch size 1 restriction, inference takes more than 10 hours to finish. So, it'd be great if we could set batch size more than 1.

@samet-akcay
Copy link
Contributor

@fujikosu, we have just merged #580, which should address the multiple batch size problem. Let us know if you still encounter any issues. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request Metrics Metric Component.
Projects
None yet
Development

No branches or pull requests

4 participants