Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpoint save fails with customized checkpoint_callback #2031

Closed
deng-cy opened this issue May 31, 2020 · 5 comments
Closed

Checkpoint save fails with customized checkpoint_callback #2031

deng-cy opened this issue May 31, 2020 · 5 comments
Labels
help wanted Open to be worked on

Comments

@deng-cy
Copy link
Contributor

deng-cy commented May 31, 2020

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

  1. Set a customized checkpoint_callback
    checkpoint_callback = pl.callbacks.ModelCheckpoint( filepath=os.getcwd(), monitor='accuracy', )
  2. Run trainer
    trainer = pl.Trainer.from_argparse_args(args, checkpoint_callback=checkpoint_callback )

Then it shows:

Traceback (most recent call last):
  File "main.py", line 122, in <module>
    main(args)
  File "main.py", line 102, in main
    trainer.fit(litmodel)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 859, in fit
    self.single_gpu_train(model)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 503, in single_gpu_train
    self.run_pretrain_routine(model)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1015, in run_pretrain_routine
    self.train()
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 347, in train
    self.run_training_epoch()
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 465, in run_training_epoch
    self.log_metrics(batch_step_metrics, grad_norm_dic)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/pytorch_lightning/trainer/logging.py", line 74, in log_metrics
    self.logger.save()
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py", line 10, in wrapped_fn
    return fn(*args, **kwargs)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/pytorch_lightning/loggers/tensorboard.py", line 161, in save
    save_hparams_to_yaml(hparams_file, self.hparams)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 151, in save_hparams_to_yaml
    yaml.dump(hparams, fp)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/__init__.py", line 290, in dump
    return dump_all([data], stream, Dumper=Dumper, **kwds)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/__init__.py", line 278, in dump_all
    dumper.represent(data)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 27, in represent
    node = self.represent_data(data)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 48, in represent_data
    node = self.yaml_representers[data_types[0]](self, data)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 207, in represent_dict
    return self.represent_mapping('tag:yaml.org,2002:map', data)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 118, in represent_mapping
    node_value = self.represent_data(item_value)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 52, in represent_data
    node = self.yaml_multi_representers[data_type](self, data)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 342, in represent_object
    return self.represent_mapping(
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 118, in represent_mapping
    node_value = self.represent_data(item_value)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 52, in represent_data
    node = self.yaml_multi_representers[data_type](self, data)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 346, in represent_object
    return self.represent_sequence(tag+function_name, args)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 92, in represent_sequence
    node_item = self.represent_data(item)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 52, in represent_data
    node = self.yaml_multi_representers[data_type](self, data)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 342, in represent_object
    return self.represent_mapping(
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 118, in represent_mapping
    node_value = self.represent_data(item_value)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 52, in represent_data
    node = self.yaml_multi_representers[data_type](self, data)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 342, in represent_object
    return self.represent_mapping(
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 118, in represent_mapping
    node_value = self.represent_data(item_value)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 52, in represent_data
    node = self.yaml_multi_representers[data_type](self, data)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 342, in represent_object
    return self.represent_mapping(
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 118, in represent_mapping
    node_value = self.represent_data(item_value)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 48, in represent_data
    node = self.yaml_representers[data_types[0]](self, data)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 207, in represent_dict
    return self.represent_mapping('tag:yaml.org,2002:map', data)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 118, in represent_mapping
    node_value = self.represent_data(item_value)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 52, in represent_data
    node = self.yaml_multi_representers[data_type](self, data)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 342, in represent_object
    return self.represent_mapping(
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 118, in represent_mapping
    node_value = self.represent_data(item_value)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 52, in represent_data
    node = self.yaml_multi_representers[data_type](self, data)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 342, in represent_object
    return self.represent_mapping(
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 118, in represent_mapping
    node_value = self.represent_data(item_value)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 52, in represent_data
    node = self.yaml_multi_representers[data_type](self, data)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 342, in represent_object
    return self.represent_mapping(
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 118, in represent_mapping
    node_value = self.represent_data(item_value)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 52, in represent_data
    node = self.yaml_multi_representers[data_type](self, data)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 342, in represent_object
    return self.represent_mapping(
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 118, in represent_mapping
    node_value = self.represent_data(item_value)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 52, in represent_data
    node = self.yaml_multi_representers[data_type](self, data)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 342, in represent_object
    return self.represent_mapping(
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 118, in represent_mapping
    node_value = self.represent_data(item_value)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 52, in represent_data
    node = self.yaml_multi_representers[data_type](self, data)
  File "/home/dengcy/conda/myenv/lib/python3.8/site-packages/yaml/representer.py", line 317, in represent_object
    reduce = data.__reduce_ex__(2)
TypeError: cannot pickle '_thread.lock' object

Environment

  • PyTorch Version (e.g., 1.0): 1.5
  • OS (e.g., Linux): CentOS, Windows (I tried on both computers)
  • How you installed PyTorch (conda, pip, source): pip
  • Python version: 3.7
  • CUDA/cuDNN version: 10.2
  • Lightning Version: 0.7.6
@deng-cy deng-cy added the help wanted Open to be worked on label May 31, 2020
@github-actions
Copy link
Contributor

Hi! thanks for your contribution!, great first issue!

@gauravtanwar03
Copy link

Hi!
I am getting the same error when i am passing a normal callback.
trainer = Trainer.from_argparse_args(args, callbacks=[MyPrintingCallback()])

@acxz
Copy link

acxz commented Jun 10, 2020

I believe this is a duplicate of #1714

@awaelchli
Copy link
Member

Please try the latest PL version, I believe this was fixed.
Also, for @gauravtanwar03: Make sure you define your PrintingCallback outside the main function where you launch the Trainer, e.g.:

class MyPrintingCallback:
    ...

main(args):
    Trainer(...)

@Borda
Copy link
Member

Borda commented Aug 4, 2020

closing as #1714 was resolved, pls feel free to re-open if needed 🐰

@Borda Borda closed this as completed Aug 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Open to be worked on
Projects
None yet
Development

No branches or pull requests

5 participants