Log a warning/ raise an error when lightning replaces an existing Sampler #1995

devashishshankar · 2020-05-28T22:31:50Z

🚀 Feature

First of all, thanks for this awesome project! Really enjoying using it!

Feature Request: Log a warning or raise a MisconfigurationException when lightning replaces an existing sampler with DistributedSampler.

Even though this behaviour is documented, it's not intuitive. Also, if someone has defined a sampler - it will lead to very different training results if you use Single GPU vs DDP. So best to warn a user to remove the sampler from their code if using replace_sampler_ddp=True.

Motivation

I'm pretty new to lightning - and recently moved one of my models to lightning. Unaware of the details, I chose ddp training strategy (since it was recommended). For a long time, my model was not converging on the val set - and it took me a lot of time to realise what was happening - lightning had replaced my Dataloader's sampler with DistributedSampler! In my particular model, balanced sampling iskey to convergence (since data is highly imbalanced). Lightning's behaviour makes sense in hindsight, and is well documented - but may have saved me a lot of debugging time had it given a simple warning or error when this happens.

Additional context

The code change is very trivial. We can take advantage of the fact that DataLoader's default sampler is SequentialSampler (if shuffle=False and sampler=None).

data_loading.py:115

if self.replace_sampler_ddp and need_dist_sampler:
    if not isinstance(dataloader.dataset.sampler, SequentialSampler):
        raise MisconfigurationException(...)

Does this make sense? Happy to submit a PR if it does.

On a separate note, can anyone here explain why exactly is DistributedSampler needed when using DDP? I can see that it will help if I have shuffle=false. It will probably help maintain sanity of an epoch (since otherwise epoch length will become num_processes*epoch).

But let's say I am doing random shuffling - and am cognizant of the fact that my epoch sizes will be higher, there is no real reason that I need to use DistributedSampler for DDP right? Or am I missing something?

The text was updated successfully, but these errors were encountered:

github-actions · 2020-05-28T22:32:30Z

Hi! thanks for your contribution!, great first issue!

Borda · 2020-05-29T21:04:53Z

@devashishshankar sounds good to me, mind send a PR?

devashishshankar · 2020-05-30T09:58:58Z

@Borda Yes, will do!

devashishshankar added feature Is an improvement or enhancement help wanted Open to be worked on labels May 28, 2020

Borda added the good first issue Good for newcomers label May 29, 2020

devashishshankar mentioned this issue May 30, 2020

Raise an error when lightning replaces an existing sampler #2020

Merged

5 tasks

williamFalcon closed this as completed in #2020 Jun 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log a warning/ raise an error when lightning replaces an existing Sampler #1995

Log a warning/ raise an error when lightning replaces an existing Sampler #1995

devashishshankar commented May 28, 2020 •

edited by Borda

Loading

github-actions bot commented May 28, 2020

Borda commented May 29, 2020

devashishshankar commented May 30, 2020

Log a warning/ raise an error when lightning replaces an existing Sampler #1995

Log a warning/ raise an error when lightning replaces an existing Sampler #1995

Comments

devashishshankar commented May 28, 2020 • edited by Borda Loading

🚀 Feature

Motivation

Additional context

github-actions bot commented May 28, 2020

Borda commented May 29, 2020

devashishshankar commented May 30, 2020

devashishshankar commented May 28, 2020 •

edited by Borda

Loading