Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.8.0-dev hydra changed working directory for other GPUs with DDP #2086

Closed
joe32140 opened this issue Jun 5, 2020 · 9 comments · Fixed by #2115
Closed

0.8.0-dev hydra changed working directory for other GPUs with DDP #2086

joe32140 opened this issue Jun 5, 2020 · 9 comments · Fixed by #2115
Labels
help wanted Open to be worked on

Comments

@joe32140
Copy link

joe32140 commented Jun 5, 2020

🐛 Bug

When I moved from 0.7.6 to 0.8.0-dev for DicConfig support for saving model hparams, I found that working directory changed for GPUs in DDP setting. I modify from huggingface.

Code sample

python: can't open file '/home/joe/summarization/models/bart/outputs/2020-06-05/11-57-25/finetune.py': [Errno 2] No such file or directory
python: can't open file '/home/joe/summarization/models/bart/outputs/2020-06-05/11-57-25/finetune.py': [Errno 2] No such file or directory
python: can't open file '/home/joe/summarization/models/bart/outputs/2020-06-05/11-57-25/finetune.py': [Errno 2] No such file or directory
python: can't open file '/home/joe/summarization/models/bart/outputs/2020-06-05/11-57-25/finetune.py': [Errno 2] No such file or directory
python: can't open file '/home/joe/summarization/models/bart/outputs/2020-06-05/11-57-25/finetune.py': [Errno 2] No such file or directory
python: can't open file '/home/joe/summarization/models/bart/outputs/2020-06-05/11-57-25/finetune.py': [Errno 2] No such file or directory
python: can't open file '/home/joe/summarization/models/bart/outputs/2020-06-05/11-57-25/finetune.py': [Errno 2] No such file or directory
initializing ddp: LOCAL_RANK: 0/7 WORLD_SIZE:8
[2020-06-05 11:57:57,549][lightning][INFO] - initializing ddp: LOCAL_RANK: 0/7 WORLD_SIZE:8

where finetune.py is in /home/joe/summarization/models/bart/
and the program just stuck here like forever.

My hydra config.yaml is

defaults:
  - trainer: train
  - data: example
model_name_or_path: bart-large
output_dir: /data/models/
cache_dir: ""
config_name: bart-large
tokenizer_name: bart-large
downgrade: True
gradient_accumulation_steps: 1
max_grad_norm: 1
fp16: False
n_tpu_cores: 0
weight_decay: 0
adam_epsilon: 1e-8

Expected behavior

Programs on other GPUs should load finetune.py from '/home/joe/summarization/models/bart/

Environment

  • version: 0.8.0-dev
  • PyTorch Version (e.g., 1.0): 1.4
  • OS (e.g., Linux): ubuntu 18.04
  • How you installed PyTorch (conda, pip, source): conda
  • Python version: 3.6
@joe32140 joe32140 added the help wanted Open to be worked on label Jun 5, 2020
@williamFalcon
Copy link
Contributor

can you share a colab for this?

@omry
Copy link
Contributor

omry commented Jun 7, 2020

Hydra is changing your current working directory. This is the standard behavior. I am not sure if this is what you are expecting.

See working directory in the basic tutorial.

@williamFalcon
Copy link
Contributor

williamFalcon commented Jun 7, 2020

@omry is there a way to not change directory? i’m afraid this breaks ddp since it needs to call the script multiple times. It also doesn’t seem like the user is expecting this anyhow, which might not be a good user experience.

unfortunately the benefits of calling ddp this way FAR outweigh the problems with using .spawn.

finally, changing the directory for the user is going to break a lot of things like loggers, etc... imho, this should not be done.

@omry
Copy link
Contributor

omry commented Jun 7, 2020

@williamFalcon,
Hydra supports running the same application multiple times in parallel and working directory isolation is an important feature.
Users can customize their working directory, including customizing it to ".", which will effectively prevent this.
Using an API in Hydra you cam also access the original working directory (and easily get files relative to it).

This default behavior is here to stay and Hydra users are expecting it if they read the documentation (Most of them do).

We can look together at the issue with DDP and see what is the best way to address this. (for example by using the API I mentioned to access the original working directory or by another mean).

@joe32140
Copy link
Author

joe32140 commented Jun 7, 2020

@omry
I knew hydra will generate output files in the new working directory, e.g., logs and yaml files.
However, in 0.7.6, the script with ddp backend doesn't try to load the main script finetune.py in the new working directory, but it happens in 0.8.0-dev. I am wondering what causes this change in 0.8.0-dev?

@omry
Copy link
Contributor

omry commented Jun 7, 2020

@joe32140, this is a new change in how PL which is not related to the hparam change.
@williamFalcon can give more details about the reason for the change.

I think there is a slight incompatibility here between Hydra and the new version of PL that should be pretty simple to fix.

@williamFalcon
Copy link
Contributor

@jcsagar
Copy link

jcsagar commented Jul 3, 2020

So is it safe to assume the "new" ddp and hydra will be incompatible indefinitely? Hydra has almost completely changed my config work-flow... especially with the multi-runs for parameter searches, which separate into different working directories per run. The new PL ddp is looking amazing though.

@omry
Copy link
Contributor

omry commented Jul 3, 2020

I will let @williamFalcon give an authoritative answer, but my understanding is that this should now be compatible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants