Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log a warning/ raise an error when lightning replaces an existing Sampler #1995

Closed
devashishshankar opened this issue May 28, 2020 · 3 comments · Fixed by #2020
Closed

Log a warning/ raise an error when lightning replaces an existing Sampler #1995

devashishshankar opened this issue May 28, 2020 · 3 comments · Fixed by #2020
Labels
feature Is an improvement or enhancement good first issue Good for newcomers help wanted Open to be worked on

Comments

@devashishshankar
Copy link
Contributor

devashishshankar commented May 28, 2020

🚀 Feature

First of all, thanks for this awesome project! Really enjoying using it!

Feature Request: Log a warning or raise a MisconfigurationException when lightning replaces an existing sampler with DistributedSampler.

Even though this behaviour is documented, it's not intuitive. Also, if someone has defined a sampler - it will lead to very different training results if you use Single GPU vs DDP. So best to warn a user to remove the sampler from their code if using replace_sampler_ddp=True.

Motivation

I'm pretty new to lightning - and recently moved one of my models to lightning. Unaware of the details, I chose ddp training strategy (since it was recommended). For a long time, my model was not converging on the val set - and it took me a lot of time to realise what was happening - lightning had replaced my Dataloader's sampler with DistributedSampler! In my particular model, balanced sampling iskey to convergence (since data is highly imbalanced). Lightning's behaviour makes sense in hindsight, and is well documented - but may have saved me a lot of debugging time had it given a simple warning or error when this happens.

Additional context

The code change is very trivial. We can take advantage of the fact that DataLoader's default sampler is SequentialSampler (if shuffle=False and sampler=None).

data_loading.py:115

if self.replace_sampler_ddp and need_dist_sampler:
    if not isinstance(dataloader.dataset.sampler, SequentialSampler):
        raise MisconfigurationException(...)

Does this make sense? Happy to submit a PR if it does.

On a separate note, can anyone here explain why exactly is DistributedSampler needed when using DDP? I can see that it will help if I have shuffle=false. It will probably help maintain sanity of an epoch (since otherwise epoch length will become num_processes*epoch).

But let's say I am doing random shuffling - and am cognizant of the fact that my epoch sizes will be higher, there is no real reason that I need to use DistributedSampler for DDP right? Or am I missing something?

@devashishshankar devashishshankar added feature Is an improvement or enhancement help wanted Open to be worked on labels May 28, 2020
@github-actions
Copy link
Contributor

Hi! thanks for your contribution!, great first issue!

@Borda Borda added the good first issue Good for newcomers label May 29, 2020
@Borda
Copy link
Member

Borda commented May 29, 2020

@devashishshankar sounds good to me, mind send a PR?

@devashishshankar
Copy link
Contributor Author

@Borda Yes, will do!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement good first issue Good for newcomers help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants