Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep the setting of user created DataLoader in replacing DistributedSampler #4650

Closed
wlkz opened this issue Nov 13, 2020 · 4 comments · Fixed by #5139
Closed

Keep the setting of user created DataLoader in replacing DistributedSampler #4650

wlkz opened this issue Nov 13, 2020 · 4 comments · Fixed by #5139
Labels
feature Is an improvement or enhancement help wanted Open to be worked on

Comments

@wlkz
Copy link

wlkz commented Nov 13, 2020

🚀 Feature

Motivation

As mention at #2789, the default behavior of replace_sampler_ddp is creating a new DistributedSampler. The shuffle setting depends on the kind of dataloader (train or val/test dataloader). However, this behavior override the setting of user defined dataloader, such as shuffle or drop_last. A more reasonable solution is to get this setting direly from user created dataloader, and apply the same setting in DistributedSampler.

Pitch

For example, we can get the shuffle setting form dataloader.sampler. If this is a instance of SequentialSampler, shuffle=False.

Alternatives

Set replace_sampler_ddp=False, and handle it by hand.

Additional context

@wlkz wlkz added feature Is an improvement or enhancement help wanted Open to be worked on labels Nov 13, 2020
@github-actions
Copy link
Contributor

Hi! thanks for your contribution!, great first issue!

@rohitgr7
Copy link
Contributor

Set replace_sampler_ddp=False, and handle it by hand.

I think this is the ideal way to handle this. Else we will make things more complicated.

@wlkz
Copy link
Author

wlkz commented Nov 14, 2020

But more I concern is the side effect of the default setting replace_sampler_ddp=True. Both Getting started-Lightning in 2 steps and Common Use Cases-Multi-GPU training do not describe the magic of sampler replacement. In most case it seems correct. But unexpected setting change sometimes make thing worse. For example, some loss function is batch size sensitive, and the flag drop_last=True must be set in dataloader. However, the replaced DistributedSampler ignore this setting, which will give a unexpected gradient update in last batch. In this situation, the error is hard to debug, as the training work well in single GPU, but fail in DDP training. Thing won't get better until you have a look to lengthy API reference of Trainer, and find correct samplers mess up everything.

Set replace_sampler_ddp=False, and handle it by hand.

I think this is the ideal way to handle this. Else we will make things more complicated.

I totally agree with you. If this is a user responsibility to handle it, I suggest to make a brief description about replace_sampler_ddp in Common Use Cases-Multi-GPU training, telling the user the setting drop_last and shuffle would be reset, and handle it by hand in this situation.

@stale
Copy link

stale bot commented Dec 14, 2020

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

@stale stale bot added the won't fix This will not be worked on label Dec 14, 2020
@rohitgr7 rohitgr7 mentioned this issue Dec 14, 2020
11 tasks
@rohitgr7 rohitgr7 removed the won't fix This will not be worked on label Dec 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants