Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. #532

Closed
alevangel opened this issue Aug 31, 2022 · 4 comments · Fixed by #562
Assignees

Comments

@alevangel
Copy link

alevangel commented Aug 31, 2022

Describe the bug
Trying to do an inference on CPU, with a model trained on GPU (Colab). But I get this error:
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
Where I can safely set the map_location?

To Reproduce

      model = get_model(config)
       callbacks = get_callbacks(config)

       trainer = Trainer(callbacks=callbacks, **config.trainer)

       transform_config = config.dataset.transform_config.val if "transform_config" in config.dataset.keys() else None
       dataset = InferenceDataset(
           my_args['input'], image_size=tuple(config.dataset.image_size), transform_config=transform_config
       )
       dataloader = DataLoader(dataset)
       trainer.predict(model=model, dataloaders=[dataloader])

@samet-akcay
Copy link
Contributor

@alevangel, do you set the device to cpu in your config file?

@alevangel
Copy link
Author

alevangel commented Sep 1, 2022

@samet-akcay is set to 'auto'.

This is the models/patchcore/config.yaml

dataset:
  name: mvtec #options: [mvtec, btech, folder]
  format: mvtec
  path: ./datasets/PRtg
  task: segmentation
  category: rocks
  image_size: 448
  train_batch_size: 32
  test_batch_size: 1
  num_workers: 8
  transform_config:
    train: null
    val: null
  create_validation_set: false
  tiling:
    apply: false
    tile_size: 64
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16

model:
  name: patchcore
  backbone: wide_resnet50_2
  pre_trained: true
  layers:
    - layer1
    - layer2
    - layer3
  coreset_sampling_ratio: 0.1
  num_neighbors: 9
  normalization_method: min_max # options: [null, min_max, cdf]

metrics:
  image:
    - F1Score
    - AUROC
  pixel:
    - F1Score
    - AUROC
  threshold:
    image_default: 0
    pixel_default: 0
    adaptive: true

visualization:
  show_images: False # show images on the screen
  save_images: True # save images to the file system
  log_images: True # log images to the available loggers (if any)
  image_save_path: null # path to which images will be saved
  mode: full # options: ["full", "simple"]

project:
  seed: 42
  path: ./results

logging:
  logger: [] # options: [tensorboard, wandb, csv] or combinations.
  log_graph: false # Logs the model graph to respective logger.

optimization:
  export_mode: null # options: onnx, openvino

# PL Trainer Args. Don't add extra parameter here.
trainer:
  accelerator: auto # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
  accumulate_grad_batches: 1
  amp_backend: native
  auto_lr_find: false
  auto_scale_batch_size: false
  auto_select_gpus: false
  benchmark: false
  check_val_every_n_epoch: 1 # Don't validate before extracting features.
  default_root_dir: null
  detect_anomaly: false
  deterministic: false
  devices: 1
  enable_checkpointing: true
  enable_model_summary: true
  enable_progress_bar: true
  fast_dev_run: false
  gpus: null # Set automatically
  gradient_clip_val: 0
  ipus: null
  limit_predict_batches: 1.0
  limit_test_batches: 1.0
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  log_every_n_steps: 50
  log_gpu_memory: null
  max_epochs: 1
  max_steps: -1
  max_time: null
  min_epochs: null
  min_steps: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle
  num_nodes: 1
  num_processes: null
  num_sanity_val_steps: 0
  overfit_batches: 0.0
  plugins: null
  precision: 32
  profiler: null
  reload_dataloaders_every_n_epochs: 0
  replace_sampler_ddp: true
  strategy: null
  sync_batchnorm: false
  tpu_cores: null
  track_grad_norm: -1
  val_check_interval: 1.0 # Don't validate before extracting features.

@alevangel
Copy link
Author

alevangel commented Sep 1, 2022

The error rises from this line:

pl_module.load_state_dict(torch.load(self.weights_path)["state_dict"])

Where it attemps to load a model from its weights, but they where trained on GPU I guess.

I solved this loading error modifying the function like that:

    def on_predict_start(self, _trainer, pl_module: AnomalyModule) -> None:
        """Call when inference begins.

        Loads the model weights from ``weights_path`` into the PyTorch module.
        """
        device = torch.device('cpu') if not torch.cuda.is_available() else torch.device('cuda')
        logger.info("Loading the model from %s", self.weights_path)
        pl_module.load_state_dict(torch.load(self.weights_path, map_location=device)["state_dict"])

@samet-akcay
Copy link
Contributor

@alevangel, thanks for your suggestion. We could maybe add a fix using map_location=pl_modue.device, which would ensure that model and device is always consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants