Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DDP static_graph=True #6940

Merged
merged 6 commits into from
May 13, 2022
Merged

Implement DDP static_graph=True #6940

merged 6 commits into from
May 13, 2022

Conversation

glenn-jocher
Copy link
Member

@glenn-jocher glenn-jocher commented Mar 11, 2022

Experimental implementation of new PyTorch 1.11.0 DDP feature. See https://pytorch.org/blog/pytorch-1.11-released/ for details.

Screenshot 2022-03-11 at 01 10 51

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Enhanced PyTorch compatibility and optimized model training performance in multi-GPU setups.

📊 Key Changes

  • Added check_version utility function to confirm PyTorch version compatibility.
  • Conditionally enabled the static_graph parameter in DDP based on PyTorch version check.

🎯 Purpose & Impact

  • The check_version function ensures the training script is compatible with specific PyTorch versions; this prevents potential version-related errors for users.
  • Using static_graph=True with torch.nn.parallel.DistributedDataParallel (DDP) when available (PyTorch 1.11.0+) optimizes performance during training with static computational graphs, leading to potentially faster and more efficient multi-GPU training.
  • Users can expect smoother experiences when training with different PyTorch versions and more efficient resource usage in distributed training environments. 🚀

Experimental implementation of new PyTorch 1.11.0 DDP feature.
@glenn-jocher glenn-jocher self-assigned this Mar 11, 2022
@glenn-jocher
Copy link
Member Author

Profiling results 0.206 (PR) vs 0.209 hours (master). Should repeat.

python -m torch.distributed.run --nproc_per_node 2 --master_port 1 train.py --data VOC.yaml --cache disk --batch 128 --weights '' --cfg yolov5m6.yaml --epochs 10 --img 640 --device 6,7

@glenn-jocher glenn-jocher merged commit d95a728 into master May 13, 2022
@glenn-jocher glenn-jocher deleted the update/ddp branch May 13, 2022 10:32
tdhooghe pushed a commit to tdhooghe/yolov5 that referenced this pull request Jun 10, 2022
* Implement DDP `static_graph=True`

Experimental implementation of new PyTorch 1.11.0 DDP feature.

* Add 1.11.0 check

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
ctjanuhowski pushed a commit to ctjanuhowski/yolov5 that referenced this pull request Sep 8, 2022
* Implement DDP `static_graph=True`

Experimental implementation of new PyTorch 1.11.0 DDP feature.

* Add 1.11.0 check

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant