Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDP cannot start due to pickle problem #1640

Closed
cmpute opened this issue Apr 27, 2020 · 6 comments
Closed

DDP cannot start due to pickle problem #1640

cmpute opened this issue Apr 27, 2020 · 6 comments
Labels
bug Something isn't working help wanted Open to be worked on

Comments

@cmpute
Copy link
Contributor

cmpute commented Apr 27, 2020

🐛 Bug

DDP cannot start with following error. This happened after I upgraded from 0.7.1 to 0.7.5.

Traceback (most recent call last):                                                                                        
  File "train.py", line 365, in <module>                                                                                  
    fire.Fire(train)                                                                                                      
  File "/home/jacobz/.conda/envs/lidar/lib/python3.7/site-packages/fire/core.py", line 138, in Fire                       
    component_trace = _Fire(component, args, parsed_flag_args, context, name)                                             
  File "/home/jacobz/.conda/envs/lidar/lib/python3.7/site-packages/fire/core.py", line 468, in _Fire                      
    target=component.__name__)                                                                                            
  File "/home/jacobz/.conda/envs/lidar/lib/python3.7/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace        
    component = fn(*varargs, **kwargs)                                                                                    
  File "train.py", line 348, in train                                                                                     
    trainer.fit(model)                                                                                                    
  File "/home/jacobz/.conda/envs/lidar/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 751, in fit
    mp.spawn(self.ddp_train, nprocs=self.num_processes, args=(model,))                                                    
  File "/home/jacobz/.conda/envs/lidar/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 162, in spawn    
    process.start()                                                                                                       
  File "/home/jacobz/.conda/envs/lidar/lib/python3.7/multiprocessing/process.py", line 112, in start                      
    self._popen = self._Popen(self)                                                                                       
  File "/home/jacobz/.conda/envs/lidar/lib/python3.7/multiprocessing/context.py", line 284, in _Popen                     
    return Popen(process_obj)                                                                                             
  File "/home/jacobz/.conda/envs/lidar/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in __init__          
    super().__init__(process_obj)                                                                                         
  File "/home/jacobz/.conda/envs/lidar/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__                 
    self._launch(process_obj)                                                                                             
  File "/home/jacobz/.conda/envs/lidar/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 47, in _launch           
    reduction.dump(process_obj, fp)                                                                                       
  File "/home/jacobz/.conda/envs/lidar/lib/python3.7/multiprocessing/reduction.py", line 60, in dump                      
    ForkingPickler(file, protocol).dump(obj)                                                                              
TypeError: cannot serialize '_io.TextIOWrapper' object   

To Reproduce

Sorry I don't have a short example to reproduce this yet.

Environment

* CUDA:
        - GPU:
                - GeForce GTX 1080 Ti
        - available:         True
        - version:           10.1
* Packages:
        - numpy:             1.18.1
        - pyTorch_debug:     False
        - pyTorch_version:   1.4.0
        - pytorch-lightning: 0.7.5
        - tensorboard:       2.2.0
        - tqdm:              4.45.0
* System:
        - OS:                Linux
        - architecture:
                - 64bit
                - 
        - processor:         x86_64
        - python:            3.7.7
        - version:           #38~18.04.1-Ubuntu SMP Tue Mar 31 04:17:56 UTC 2020
@cmpute cmpute added bug Something isn't working help wanted Open to be worked on labels Apr 27, 2020
@cmpute
Copy link
Contributor Author

cmpute commented Apr 27, 2020

Find a related issue hyperopt/hyperopt-sklearn#74, but I'm sure there's no logger in my module.

And also if I try to pickle my model:

import pickle
pickle.dumps(model)

There's no error occurred..

@quinor
Copy link
Contributor

quinor commented Apr 28, 2020

@cmpute try pickling the Trainer, that's what usually fails. See #1628 for a similar error I debugged yesterday. It is probably something custom (or just non-default!) that you pass to the Trainer, ie. in args.

@cmpute
Copy link
Contributor Author

cmpute commented Apr 30, 2020

I haven't met this problem for a while, it may be fixed by latest commits.. I'll close it for now

@cmpute cmpute closed this as completed Apr 30, 2020
@sabetAI
Copy link

sabetAI commented May 27, 2020

I'm also experiencing this now. Not fixed yet!

@rakhimovv
Copy link

rakhimovv commented Jul 26, 2020

the same issue, please reopen

pl.version='0.9.0rc1'

In my case, it happens when I provide the output_filename parameter to pytorch_lightning.profiler.SimpleProfiler and run in ddp_spawn regime

@DeVriesMatt
Copy link

I get the same issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

No branches or pull requests

5 participants