-
-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected behavior in DDP mode with dataloader workers #5628
Comments
👋 Hello @wico-silva, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution. If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you. If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available. For business inquiries or professional support requests please visit https://ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com. RequirementsPython>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started: $ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt EnvironmentsYOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
StatusIf this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit. |
@wico-silva --workers are per RANK if you use DDP. This allows you to scale training without having to worry about manually specifying --workers, i.e. python -m torch.distributed.run --nproc_per_node 2 train.py --batch-size 16
python -m torch.distributed.run --nproc_per_node 4 train.py --batch-size 32
python -m torch.distributed.run --nproc_per_node 8 train.py --batch-size 64 |
But then why not apply that same logic with batch size? My main point is that workers and batch size should follow the same logic. Either per rank or for world. Currently, workers is per rank and batch size is for world. At least, this difference in logic should be documented clearly in the command-line arguments. |
@wico-silva thanks for the feedback! Have you used other tools that follow one convention or the other for these two settings? |
@wico-silva we can definitely put some more checks in place to prevent excess workers and to improve the console output for clarity, but I doubt it's a good idea to modify default behavior with so many users already using existing YOLOv5 DDP conventions. |
What I see the most is definitely workers and batch size per rank, because that's just what naturally happens without any extra code. It's also what makes the most sense given that On the other hand, if we look at the PyTorch ImageNet example we see that they use workers and batch size for world. They then divide both per rank here: https://github.com/pytorch/examples/blob/master/imagenet/main.py#L145 I guess what's important is being consistent. But I also understand that this would be an annoying change for many, so a compromise would be to document this behavior in the command-line help |
@wico-silva got it! I've opened up a PR #5631 that ticks a few of these boxes. This retains the current behavior but it updates the command line help and most importantly it caps the vcpu usage smartly with world-size, so the danger is removed that a DDP user would accidentally use too many workers. This is probably the best compromise. |
thanks for the quick replies and addressing it so fast. I'll close the issue |
@wico-silva PR #5631 is merged. Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐ |
Search before asking
YOLOv5 Component
Multi-GPU
Bug
In distributed multi-GPU training we specify the total batch size in the command-line which then gets divided per process here: https://github.com/ultralytics/yolov5/blob/master/train.py#L222
I would expect the same to be done for the number of workers but that's not the case.
So if I run:
python -m torch.distributed.run --nproc_per_node 2 train.py --batch-size 16 --workers 8
I expect two process each with batch size 8 and 4 dataloader workers.
Instead, I get two processes, each with batch size 8 and 8 dataloader workers which is dangerous because it might be too many workers for my CPU to handle and training gets extremely slow.
Environment
Minimal Reproducible Example
No response
Additional
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: