Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-gpu ddp calls validation and testing loops too many times #1161

Closed
sneiman opened this issue Mar 16, 2020 · 6 comments · Fixed by #1192
Closed

multi-gpu ddp calls validation and testing loops too many times #1161

sneiman opened this issue Mar 16, 2020 · 6 comments · Fixed by #1192
Labels
bug Something isn't working help wanted Open to be worked on
Milestone

Comments

@sneiman
Copy link
Contributor

sneiman commented Mar 16, 2020

When using ddp with multiple gpus, each validation and test loop is called with the entire validation dataset for each gpu.

Expected behavior is that the dataset is divided appropriately across the gpus.

I am using current master (cloned Mar 14), Ubuntu 19.10, Cuda 10.1, python 3.7.5, pytorch 1.4, venv environment.

The problem appears to be in auto_add_sampler() in data_loading.py. It does not create a DistributedSampler for validation or test datasets.

@sneiman sneiman added bug Something isn't working help wanted Open to be worked on labels Mar 16, 2020
@sneiman
Copy link
Contributor Author

sneiman commented Mar 16, 2020

Latest pull - 1 hour ago, no longer this behavior. Closing.

@sneiman sneiman closed this as completed Mar 16, 2020
@sneiman
Copy link
Contributor Author

sneiman commented Mar 17, 2020

Sorry - this issue still exists in some configurations. My proposed fix is not the total picture. Still investigating - will provide reproducible example.

@sneiman sneiman reopened this Mar 17, 2020
@sneiman
Copy link
Contributor Author

sneiman commented Mar 17, 2020

Testing underway. Will make PR tomorrow.

@sneiman
Copy link
Contributor Author

sneiman commented Mar 17, 2020

Dont want to clutter up PR world if no one is interested in this. Let me know ...

@sneiman sneiman changed the title multi-gpu ddp calls validation loop too many times multi-gpu ddp calls validation and testing loops too many times Mar 17, 2020
@Borda Borda added this to the 0.7.2 milestone Mar 18, 2020
@Borda
Copy link
Member

Borda commented Mar 18, 2020

that sounds a good contribution to me... mind send a PR?
Any suggestion @PyTorchLightning/core-contributors?
in a technical note when you refer some master state pls use coit hash as there can be multiple commits each day...

@sneiman
Copy link
Contributor Author

sneiman commented Mar 18, 2020

will do on both pr, and hash ref

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants