Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct CWD for ddp subprocesses when using Hydra #2719

Merged
merged 3 commits into from
Jul 28, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions pytorch_lightning/trainer/distrib_data_parallel.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,8 @@ def train_fx(trial_hparams, cluster_manager, _):


try:
from hydra.utils import to_absolute_path
from hydra.utils import to_absolute_path, get_original_cwd
from hydra.core.hydra_config import HydraConfig
except ImportError:
HYDRA_AVAILABLE = False
else:
Expand Down Expand Up @@ -464,7 +465,12 @@ def spawn_ddp_children(self, model):
env_copy['LOCAL_RANK'] = f'{local_rank}'

# start process
proc = subprocess.Popen(command, env=env_copy)
# if hydra is available and initialized, make sure to set the cwd correctly
cwd: Optional[str] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will break for non hydra users... no?
shouldn't cwd be set to something for non hydra people?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None is the default value for cwd (https://docs.python.org/3/library/subprocess.html#subprocess.Popen). If None, it just uses whatever the current value for cwd is.

If cwd is not None, the function changes the working directory to cwd before executing the child. cwd can be a string, bytes or path-like object. In particular, the function looks for executable (or for the first item in args) relative to cwd if the executable path is a relative path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So currently, we're setting cwd to None for everyone, hence the problem for Hydra scripts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we pull the cwd correctly then and also use it for non hydra users?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, but what should we set it to for non hydra users? do you think we need to save the original cwd and then use that? I feel like that's going down the slippery slope b/c we can't possibly anticipate how a script will change the cwd... for hydra scripts we can be sure b/c of the way it operates...

if HYDRA_AVAILABLE:
if HydraConfig.initialized():
cwd = get_original_cwd()
proc = subprocess.Popen(command, env=env_copy, cwd=cwd)
self.interactive_ddp_procs.append(proc)

# starting all processes at once can cause issues
Expand Down