Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDP breaks when python does not refer to the correct interpreter #2472

Closed
kl0211 opened this issue Jul 2, 2020 · 4 comments · Fixed by #2482
Closed

DDP breaks when python does not refer to the correct interpreter #2472

kl0211 opened this issue Jul 2, 2020 · 4 comments · Fixed by #2482
Labels
bug Something isn't working help wanted Open to be worked on

Comments

@kl0211
Copy link

kl0211 commented Jul 2, 2020

🐛 Bug

If using the DDP distributed_backend, the program breaks if python refers to python2 or does not exist.

To Reproduce

Steps to reproduce the behavior:

  1. Make sure the python command does not link to python3, such as on Ubuntu 18.04.
  2. Run Trainer().fit with distributed_backed='ddp'

Additional context

The problem lies at https://github.com/PyTorchLightning/pytorch-lightning/blob/0697dd306d578f0cd3c4d23b768da0106e54a095/pytorch_lightning/trainer/distrib_data_parallel.py#L422 The python command is hardcoded here. On many systems, python is a symlink to python2, or does not exist.

@kl0211 kl0211 added bug Something isn't working help wanted Open to be worked on labels Jul 2, 2020
@github-actions
Copy link
Contributor

github-actions bot commented Jul 2, 2020

Hi! thanks for your contribution!, great first issue!

@RitchieAlpha
Copy link

Same question!!!!!

@Borda
Copy link
Member

Borda commented Jul 3, 2020

it is based on your default python... @kl0211 @RitchieAlpha mind send a Pr which updating the actual python interpreter?
cc: @williamFalcon

@Borda Borda mentioned this issue Jul 3, 2020
@kl0211
Copy link
Author

kl0211 commented Jul 3, 2020

@Borda I just saw #2482. Using sys.executable is exactly what I was going to suggest. According to https://www.python.org/dev/peps/pep-0394/#for-python-script-publishers, this should ensure the same interpreter is used to spawn the child processes.

Thanks for the quick change!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants