Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validation and training loops run the partial dataset #1192

Merged
merged 2 commits into from
Mar 30, 2020

Conversation

sneiman
Copy link
Contributor

@sneiman sneiman commented Mar 18, 2020

Fix #1161 - when using ddp/ddpd2, the validation and training loops run the full respective dataset on each gpu. This costs time, and changes batch counts for any statistics being collected.

The fix just makes sure that for ddp and ddp2, auto_add_sampler() creates a DistributedSampler for each data set.

This passes all the tests on my machine except for slurm and apex related as I do not have either. I don't think this needs any doc changes. I can look into writing a test for this ... if needed. Let me know.

@Borda Borda changed the title Issue 1161 validation and training loops run the partial dataset Mar 18, 2020
@Borda Borda added the docs Documentation related label Mar 18, 2020
@williamFalcon
Copy link
Contributor

@srush mind taking a look? this came from our chats with the HF code.

@srush
Copy link
Contributor

srush commented Mar 30, 2020

This seems good to me. We have some val sets that are quite large.

@Borda Borda requested review from jeffling, jeremyjordan and a team March 30, 2020 16:11
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@Borda Borda added the ready PRs ready to be merged label Mar 30, 2020
@williamFalcon williamFalcon merged commit 6dfe995 into Lightning-AI:master Mar 30, 2020
alexeykarnachev pushed a commit to alexeykarnachev/pytorch-lightning that referenced this pull request Apr 3, 2020
)

* auto_add_sampler() fix

* auto_add_sampler() fix

Co-authored-by: seth <seth@duckpapa.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation related ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

multi-gpu ddp calls validation and testing loops too many times
5 participants