-
-
Notifications
You must be signed in to change notification settings - Fork 16k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix: update dataloaders.py to fix Multi-GPU DDP RAM multiple-cache issue #10383
Merged
Commits on Dec 2, 2022
-
This is to address (and hopefully fix) this issue: Multi-GPU DDP RAM multiple-cache bug ultralytics#3818 (ultralytics#3818). This was a very serious and "blocking" issue until I could figure out what was going on. The problem was especially bad when running Multi-GPU jobs with 8 GPUs, RAM usage was 8x higher than expected (!), causing repeated OOM failures. Hopefully this fix will help others. DDP causes each RANK to launch it's own process (one for each GPU) with it's own trainloader, and its own RAM image cache. The DistributedSampler used by DDP (https://github.com/pytorch/pytorch/blob/master/torch/utils/data/distributed.py) will feed only a subset of images (1/WORLD_SIZE) to each available GPU on each epoch, but since the images are shuffled between epochs, each GPU process must still cache all images. So I created a subclass of DistributedSampler called SmartDistributedSampler that forces each GPU process to always sample the same subset (using modulo arithmetic with RANK and WORLD_SIZE) while still allowing random shuffling between epochs. I don't believe this disrupts the overall "randomness" of the sampling, and I haven't noticed any performance degradation. Signed-off-by: davidsvaughn <davidsvaughn@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 5488bd5 - Browse repository at this point
Copy the full SHA 5488bd5View commit details -
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Configuration menu - View commit details
-
Copy full SHA for 5721f2e - Browse repository at this point
Copy the full SHA 5721f2eView commit details -
move extra parameter (rank) to end so won't mess up pre-existing positional args
Configuration menu - View commit details
-
Copy full SHA for be8594f - Browse repository at this point
Copy the full SHA be8594fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3e818a1 - Browse repository at this point
Copy the full SHA 3e818a1View commit details
Commits on Dec 3, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 998687d - Browse repository at this point
Copy the full SHA 998687dView commit details
Commits on Dec 5, 2022
-
sample from DDP index array (self.idx) in mixup mosaic
Configuration menu - View commit details
-
Copy full SHA for b0a30eb - Browse repository at this point
Copy the full SHA b0a30ebView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7f308e7 - Browse repository at this point
Copy the full SHA 7f308e7View commit details
Commits on Dec 7, 2022
-
Merging self.indices and self.idx (DDP indices) into single attribute…
… (self.indices). Also adding SmartDistributedSampler to segmentation dataloader
Configuration menu - View commit details
-
Copy full SHA for e2991ea - Browse repository at this point
Copy the full SHA e2991eaView commit details -
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Configuration menu - View commit details
-
Copy full SHA for 9ba2a0f - Browse repository at this point
Copy the full SHA 9ba2a0fView commit details
Commits on Dec 9, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 4ea372a - Browse repository at this point
Copy the full SHA 4ea372aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 092b944 - Browse repository at this point
Copy the full SHA 092b944View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.