Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comet logger cannot be pickled after creating an experiment #1682

Closed
jeremyjordan opened this issue May 1, 2020 · 15 comments · Fixed by #2029
Closed

Comet logger cannot be pickled after creating an experiment #1682

jeremyjordan opened this issue May 1, 2020 · 15 comments · Fixed by #2029
Labels
bug Something isn't working help wanted Open to be worked on logger Related to the Loggers

Comments

@jeremyjordan
Copy link
Contributor

🐛 Bug

The Comet logger cannot be pickled after an experiment (at least an OfflineExperiment) has been created.

To Reproduce

Steps to reproduce the behavior:

initialize the logger object (works fine)

from pytorch_lightning.loggers import CometLogger
import tests.base.utils as tutils
from pytorch_lightning import Trainer
import pickle

model, _ = tutils.get_default_model()
logger = CometLogger(save_dir='test')
pickle.dumps(logger)

initialize a Trainer object with the logger (works fine)

trainer = Trainer(
    max_epochs=1,
    logger=logger
)
pickle.dumps(logger)
pickle.dumps(trainer)

access the experiment attribute which creates the OfflineExperiment object (fails)

logger.experiment
pickle.dumps(logger)
>> TypeError: can't pickle _thread.lock objects

Expected behavior

We should be able to pickle loggers for distributed training.

Environment

  • CUDA:
    - GPU:
    - available: False
    - version: None
  • Packages:
    - numpy: 1.18.1
    - pyTorch_debug: False
    - pyTorch_version: 1.4.0
    - pytorch-lightning: 0.7.5
    - tensorboard: 2.1.0
    - tqdm: 4.42.0
  • System:
    - OS: Darwin
    - architecture:
    - 64bit
    -
    - processor: i386
    - python: 3.7.6
    - version: Darwin Kernel Version 19.3.0: Thu Jan 9 20:58:23 PST 2020; root:xnu-6153.81.5~1/RELEASE_X86_64
@jeremyjordan jeremyjordan added bug Something isn't working help wanted Open to be worked on labels May 1, 2020
@Borda
Copy link
Member

Borda commented May 1, 2020

@ceyzaguirre4 pls ^^

@F-Barto
Copy link

F-Barto commented May 18, 2020

I don't know if it can help or if it is the right place, but a similar error occurswhen running in ddp mode with the WandB logger.

WandB uses a lambda function at some point.

Does the logger have to pickled ? Couldn't it log only on rank 0 at epoch_end ?

Traceback (most recent call last):
  File "../train.py", line 140, in <module>
    main(args.gpus, args.nodes, args.fast_dev_run, args.mixed_precision, project_config, hparams)
  File "../train.py", line 117, in main
    trainer.fit(model)
  File "/home/clear/fbartocc/miniconda3/envs/Depth_env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 751, in fit
    mp.spawn(self.ddp_train, nprocs=self.num_processes, args=(model,))
  File "/home/clear/fbartocc/miniconda3/envs/Depth_env/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/clear/fbartocc/miniconda3/envs/Depth_env/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 149, in start_processes
    process.start()
  File "/home/clear/fbartocc/miniconda3/envs/Depth_env/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/home/clear/fbartocc/miniconda3/envs/Depth_env/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
    return Popen(process_obj)
  File "/home/clear/fbartocc/miniconda3/envs/Depth_env/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/clear/fbartocc/miniconda3/envs/Depth_env/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/home/clear/fbartocc/miniconda3/envs/Depth_env/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/home/clear/fbartocc/miniconda3/envs/Depth_env/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'TorchHistory.add_log_hooks_to_pytorch_module.<locals>.<lambda>'

also related:
#1704

@joseluisvaz
Copy link

joseluisvaz commented May 18, 2020

I had the same error as @jeremyjordan can't pickle _thread.lock objects. This happened when I added the logger and additional callbacks in from_argparse_args, as explained here https://pytorch-lightning.readthedocs.io/en/latest/hyperparameters.html

trainer = pl.Trainer.from_argparse_args(hparams, logger=logger, callbacks=[PrinterCallback(), ])

I could make the problem go away by directly overwriting the members of Trainer

trainer = pl.Trainer.from_argparse_args(hparams)
trainer.logger = logger
trainer.callbacks.append(PrinterCallback())

@jacobdanovitch
Copy link

Same issue as @F-Barto using a wandb logger across 2 nodes with ddp.

@huyvnphan
Copy link

same issue when using wandb logger with ddp

@danielhomola
Copy link

danielhomola commented May 26, 2020

same here.. @joseluisvaz your workaround doesn't solve the callback issue.. when I try to add a callback like this it is simply being ignored :/ but adding it the Trainer init call normally works.. so I'm pretty sure the error is thrown by the logger (I'm using TB) not the callbacks.

@Brechard
Copy link

Brechard commented May 29, 2020

Same issue, using wandb logger with 8 gpus in an AWS p2.8xlarge machine

@JSAustin
Copy link

With CometLogger, I get this error only when the experiment name is declared. If it is not declared, I get no issue.

@Borda Borda added the logger Related to the Loggers label Aug 4, 2020
@Riccorl
Copy link

Riccorl commented Feb 11, 2022

I still have this error with 1.5.10 on macOS

Error executing job with overrides: ['train.pl_trainer.fast_dev_run=False', 'train.pl_trainer.gpus=0', 'train.pl_trainer.precision=32', 'logging.wandb_arg.mode=offline']
Traceback (most recent call last):
  File "/Users/ric/Documents/PhD/Projects/ed-experiments/src/train.py", line 78, in main
    train(conf)
  File "/Users/ric/Documents/PhD/Projects/ed-experiments/src/train.py", line 70, in train
    trainer.fit(pl_module, datamodule=pl_data_module)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 740, in fit
    self._call_and_handle_interrupt(
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
    self._dispatch()
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
    self.training_type_plugin.start_training(self)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
    self._results = trainer.run_stage()
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
    return self._run_train()
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1311, in _run_train
    self._run_sanity_check(self.lightning_module)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1375, in _run_sanity_check
    self._evaluation_loop.run()
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 110, in advance
    dl_outputs = self.epoch_loop.run(dataloader, dataloader_idx, dl_max_batches, self.num_dataloaders)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 140, in run
    self.on_run_start(*args, **kwargs)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 86, in on_run_start
    self._dataloader_iter = _update_dataloader_iter(data_fetcher, self.batch_progress.current.ready)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/loops/utilities.py", line 121, in _update_dataloader_iter
    dataloader_iter = enumerate(data_fetcher, batch_idx)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/utilities/fetching.py", line 198, in __iter__
    self._apply_patch()
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/utilities/fetching.py", line 133, in _apply_patch
    apply_to_collections(self.loaders, self.loader_iters, (Iterator, DataLoader), _apply_patch_fn)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/utilities/fetching.py", line 181, in loader_iters
    loader_iters = self.dataloader_iter.loader_iters
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/trainer/supporters.py", line 537, in loader_iters
    self._loader_iters = self.create_loader_iters(self.loaders)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/trainer/supporters.py", line 577, in create_loader_iters
    return apply_to_collection(loaders, Iterable, iter, wrong_dtype=(Sequence, Mapping))
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/utilities/apply_func.py", line 104, in apply_to_collection
    v = apply_to_collection(
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/utilities/apply_func.py", line 96, in apply_to_collection
    return function(data, *args, **kwargs)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/pytorch_lightning/trainer/supporters.py", line 177, in __iter__
    self._loader_iter = iter(self.loader)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 359, in __iter__
    return self._get_iterator()
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 305, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 918, in __init__
    w.start()
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/Users/ric/mambaforge/envs/ed/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'TorchHistory.add_log_hooks_to_pytorch_module.<locals>.<lambda>'

@mmcdermott
Copy link

I still see this bug as well with WandB logger.

@wrenparismoe
Copy link

Currently having this issue with wandbLogger.

@ebalogun01
Copy link

Having same issue with wandb

@akashsharma02
Copy link

@ebalogun01 Were you able to solve this issue? I'm also seeing the same issue with WandbLogger

@Borda
Copy link
Member

Borda commented Apr 22, 2023

@ebalogun01 Were you able to solve this issue? I'm also seeing the same issue with WandbLogger

What version are you using?

@akashsharma02
Copy link

@Borda I'm using Lightning 2.1.0.post0 version. Another detail I'd like to add is that, I find that wandblogger is "unpickable" only when wandb is disabled from the terminal using: wandb disabled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on logger Related to the Loggers
Projects
None yet
Development

Successfully merging a pull request may close this issue.