Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/ddp fixed #401

Merged
merged 7 commits into from
Jul 19, 2020
Merged

Conversation

MagicFrogSJTU
Copy link
Contributor

@MagicFrogSJTU MagicFrogSJTU commented Jul 14, 2020

Fixing DDP mode. #177
Work in Progress, But most of the hard things have already been done!
There are lots of commits. If every thing is settled down, I will merge them into two commits!

πŸ› οΈ PR Summary

Made with ❀️ by Ultralytics Actions

🌟 Summary

Enhanced YOLOv5 testing and training capabilities with text output and DDP support.

πŸ“Š Key Changes

  • Added --save-txt flag in test.py for saving test results in text format.
  • Implemented Distributed Data Parallel (DDP) support in train.py.
  • Included a new torch utility torch_distributed_zero_first for synchronizing distributed datasets.
  • Modified create_dataloader function to support distributed training in utils/datasets.py.
  • Increased robustness of loading images by sorting them in exif_size function.
  • Applied several code optimisations for better memory handling and DDP training efficiency.

🎯 Purpose & Impact

  • πŸ“ The --save-txt option allows users to output test results as text files, enabling easier analysis of model performance.
  • πŸš€ DDP integration provides efficient handling of large-scale training across multiple GPUs, leading to faster and more scalable training processes.
  • 🐍 The torch_distributed_zero_first context manager ensures smooth loading of datasets without clashes in a distributed training setup.
  • 🀝 The updated dataloader supports synchronized data loading across distributed environments, maintaining the accuracy of the training process.
  • βœ”οΈ The listing of images now ensures a consistent order, which can be especially beneficial when reproducing experiments or debugging.
  • πŸ’Ύ Memory and computation optimizations improve the footprint and speed of training, ensuring that resources are used effectively.

@MagicFrogSJTU MagicFrogSJTU changed the title Feature/ddp fixed [WIP] Feature/ddp fixed Jul 14, 2020
@glenn-jocher
Copy link
Member

glenn-jocher commented Jul 15, 2020

Thanks guys! I looked over the files, it looks like perhaps some of the simpler commits could be grouped into their own smaller PR that would be much faster to merge, definitely for example the dockerfile and readme updates. BTW, the argparser arguments for files are smart, so you don't need to supply the entire path: python test.py --data coco.yaml works fine. The repo searches for files automatically and assigns them absolute paths if necessary.

I wrapped up my current baselining using 1x, 2x and 4x T4 GPUs (in order from legend top to bottom). The epoch train times were 29, 19 and 15 min each. The test times were always around 1 min. When trained to 40 epochs each (well, trained to 300 and then CTRL-C after 40) using the following command these were the curves below. The final epoch 39 mAPs ranged from 0.252 to 0.254 (essentially identical). I'd like to try to repeat the same set of tests with the PR branch if I have some time this week.

python train.py --batch 64 --cfg yolov5s.yaml --data coco.yaml --img 640 --nosave --device 0,1,2,3

results

EDIT: Is there any difference in the command required with the PR? What's the equivalent command to the one above for the branch? Thanks!

@NanoCode012
Copy link
Contributor

NanoCode012 commented Jul 15, 2020

EDIT: Is there any difference in the command required with the PR? What's the equivalent command to the one above for the branch? Thanks!

For single GPU, it would be the same with --device 0

For multiple GPU, we would have to use torch.distributed.launch to launch multiple process. nproc_per_node is the number of gpus.

python -m torch.distributed.launch --nproc_per_node 2 train.py --batch-size 64 --data coco.yaml --cfg models/yolov5s.yaml --weights '' --epochs 300

Theoretically, we can expand this code to use multiple nodes with multiple GPUs, but I don't think it's necessary.
From our tests, 2 GPU is the best config for performance and speed.
See #401 (comment) and MagicFrogSJTU#7 (comment) to see our results so far.

@MagicFrogSJTU
Copy link
Contributor Author

MagicFrogSJTU commented Jul 15, 2020

Thanks guys! I looked over the files, it looks like perhaps some of the simpler commits could be grouped into their own smaller PR that would be much faster to merge, definitely for example the dockerfile and readme updates. BTW, the argparser arguments for files are smart, so you don't need to supply the entire path: python test.py --data coco.yaml works fine. The repo searches for files automatically and assigns them absolute paths if necessary.

The commit will be merged into several commits, after everything is settled down!

EDIT: Is there any difference in the command required with the PR? What's the equivalent command to the one above for the branch? Thanks!

python train.py --batch 64 --cfg yolov5s.yaml --data coco.yaml --img 640 --nosave --device 0,1,2,3 will activate DP mode.
To activate DDP mode, use the following command

# 2-GPU DDP
python -m torch.distributed.launch --nproc_per_node 2 train.py --data data/coco.yaml  --batch-size 64 --cfg models/yolov5s.yaml --weights '' --epochs 300 --device 0,1
# 2-GPU DDP with SyncBN
python -m torch.distributed.launch --nproc_per_node 2 train.py --data data/coco.yaml  --batch-size 64 --cfg models/yolov5s.yaml --weights '' --epochs 300 --device 0,1 --sync-bn
# 4-GPU DDP 
# is not supported right now because it generates lower performance, and the reason remains unknown as discussed in #264 

Here is my test results for earlier epoch.
All have: total batch size of 64. trained on V100.

exp gpus has syncBN extra config epoch1 epoch2 epoch3 epoch4 epoch5 train speed(min/epoch) Β 
default 1 \ \ 1.13 6.43 12.2 19 23.9 14 Β 
DDP 2 Yes \ 0.659 5.77 12.2 18.8 23.6 11 Β 
Β  Β  Β  Β  0.558 5.93 12.7 18.4 \ Β  Β 
DDP 2 No \ 1.1 6.42 12.9 19.3 23.9 8-9 Β 
DDP 4 Yes \ 0.517 3.82 7.34 \ \ 9 Β 
DDP 4 No \ 0.611 4.2 7.66 12.6 16.3 Β  Β 
DDP 4 No new random seed 0.569 4.03 7.85 12.5 Β  Β  Β 

In conclusion, 2-GPU DDP without Sync-BN is the better chocice for DDP now, while DP is applicable to arbitrary gpu numbers.

@NanoCode012
Copy link
Contributor

NanoCode012 commented Jul 15, 2020

Don't forget to change first epoch of DDP 4 from 5/6%->0.5/0.6% since it would say the wrong thing.

Edit: One more thing, I cannot replicate your results for 2 GPU with SyncBN. I got results similar to the Default one.
Edit 2: I misread the arguments. I thought SyncBN was on by default.

@glenn-jocher
Copy link
Member

@MagicFrogSJTU got it, thanks for the table! What we need to do is update it now with the default 2x and 4x GPU to compare the mutli-gpu updates with the current multi-gpu baseline.

If 4 GPUs are not working correctly... its going to be a bit problematic. I know some groups are using 4x and even 8x GPU trainings currently, so we need a robust solution for everyone naturally.

@NanoCode012
Copy link
Contributor

NanoCode012 commented Jul 15, 2020

I copied this from the other Issue to keep things closer. These are results from my runs.

Table runs:
Batch size: 64
SyncBatch is disabled for Magic
Trained on V100.

Branch GPU Type Epoch 1 Epoch 2 Epoch 5 Epoch 10 Epoch 25 Train time for epoch 1
Default 1 \ 0.01226 0.06774 0.2447 0.3266 0.3957 -
2 DP 0.01105 0.06385 0.2409 0.3297 0.3907 11:50
DDP 0.01243 0.06131 0.2411 0.3313 0.3911 11:55
4 DDP 0.01167 0.06343 0.2336 0.326 0.3906 -
Magic P1 1 \ 0.0121 0.06502 - - - 19:42 (CPU bottleneck)
\ 0.0131 0.06419 0.2403 0.3359 0.3951 -
2 DDP 0.009887 0.05979 0.2389 0.33 0.395 -
Magic Ft 4 DDP 0.00519 0.0403 0.168 0.251 0.323 -

Ft is short for Magic feature/DDP-fixed branch
P1 is short for Magic Patch 1 branch, which is slightly behind Feature branch. However performance should be the same.

Default's DDP is internally implemented as DP right now by Pyroch -Magic

My opinion is to enable DDP for 2 GPU and use DP for anything higher, until the issue can be found.

@glenn-jocher
Copy link
Member

@MagicFrogSJTU oops, I might have messed up the PR. I meant to remove the readme as I just pushed a few updates and included the quick fix you had here.

Ah perfect, I see the updated table. It's late here, will get back to this tomorrow.

@MagicFrogSJTU
Copy link
Contributor Author

@MagicFrogSJTU got it, thanks for the table! What we need to do is update it now with the default 2x and 4x GPU to compare the mutli-gpu updates with the current multi-gpu baseline.

If 4 GPUs are not working correctly... its going to be a bit problematic. I know some groups are using 4x and even 8x GPU trainings currently, so we need a robust solution for everyone naturally.

I used to run programs with 10xGPU, 8xGPU a lot. But I have never come across a case where 2xGPU works but 4xGPU doesn't.
This is not an easy problem, as I have put tons of time but found nothing.

@MagicFrogSJTU
Copy link
Contributor Author

@MagicFrogSJTU oops, I might have messed up the PR. I meant to remove the readme as I just pushed a few updates and included the quick fix you had here.

Ah perfect, I see the updated table. It's late here, will get back to this tomorrow.

never mind. Good night!

@NanoCode012
Copy link
Contributor

NanoCode012 commented Jul 15, 2020

Here are the results from table #401 (comment) plotted on the graph.

image

@MagicFrogSJTU , can I have your results.txt so I can compile them into one picture?

@MagicFrogSJTU
Copy link
Contributor Author

MagicFrogSJTU commented Jul 15, 2020

@glenn-jocher @NanoCode012
Damn! I find the reason for unexpected-performance of the 4-GPU DDP.
It is because random seed is set to a same number for all processses. It will result in similar pictures for mosaic data augmentation:

For each input image:
  mosaic randomly sample 3 other images, and merge all 4 images into one

Because random seed is the same for every process, the the sampled 3 other images are the same for every process! This will of course reduce the training efficiency!
Damn! I am so stupid! Stuck in this for 2 weeks!
@NanoCode012 I have pushed the fixed code. Could be please rerun your DDP test? It doesn't affect DP and normal single-gpu.

@MagicFrogSJTU
Copy link
Contributor Author

MagicFrogSJTU commented Jul 15, 2020

@glenn-jocher
By the way, I suggest we improve the data generation code. It is now over-complicated and hard to maintain..

@NanoCode012
Copy link
Contributor

@glenn-jocher @NanoCode012
Damn! I find the reason for unexpected-performance of the 4-GPU DDP.
It is because random seed is set to a same number for all processses. It will result in similar pictures for mosaic data augmentation:

For each input image:
  mosaic randomly sample 3 other images, and merge all 4 images into one

Because random seed is the same for every process, the the sampled 3 other images are the same for every process! This will of course reduce the training efficiency!

Hi @MagicFrogSJTU , I looked and saw that before. I saw on documentations that we should set their seed to same value.

image
https://yangkky.github.io/2019/07/08/distributed-pytorch-tutorial.html

I think I also saw this on Pytorch documentation but cannot find it now.
That's why I did not set the seed to different values. Moreover, aren't they given different samples of images? When mosaic is done, aren't the other pictures part of their sample?

However, I will set mine to run.

@MagicFrogSJTU
Copy link
Contributor Author

MagicFrogSJTU commented Jul 15, 2020

@glenn-jocher @NanoCode012
Damn! I find the reason for unexpected-performance of the 4-GPU DDP.
It is because random seed is set to a same number for all processses. It will result in similar pictures for mosaic data augmentation:

For each input image:
  mosaic randomly sample 3 other images, and merge all 4 images into one

Because random seed is the same for every process, the the sampled 3 other images are the same for every process! This will of course reduce the training efficiency!

Hi @MagicFrogSJTU , I looked and saw that before. I saw on documentations that we should set their seed to same value.

image
https://yangkky.github.io/2019/07/08/distributed-pytorch-tutorial.html

I think I also saw this on Pytorch documentation but cannot find it now.
That's why I did not set the seed to different values. Moreover, aren't they given different samples of images? When mosaic is done, aren't the other pictures part of their sample?

However, I will set mine to run.

Setting random seed a fixed value is key to experiment reproduction.
In DDP, we should have different random seed for different processes, but their values are fixed. Thus, the DDP experiment is still able to be reproduced.

The modern DDP will broacast the weight of rank0 to other process when DDP is set up. Thus, there is no need to set the same random seed for different processes for this.

@NanoCode012
Copy link
Contributor

I see. I guess that's why I missed it..

@MagicFrogSJTU
Copy link
Contributor Author

MagicFrogSJTU commented Jul 15, 2020

I see. I guess that's why I missed it..

My machine is down for maintainance. I don't know when it will recover...
But I have the time to run the first epoch for 4GPU DDPs. And got 1.3% mAP, which is quite similar to batch-size-16 single-gpu (with 4 accumulations). As I said before, 4GPU DDPs is theoritically same to batch-size-16 sinlge-gpu. This confirms it!
Please please tell me tell me the new results when you get it! I am so excited now!

@NanoCode012
Copy link
Contributor

NanoCode012 commented Jul 15, 2020

@MagicFrogSJTU See table below!

Type Epoch 1 Epoch 2 Epoch 3 Epoch 4 Epoch 5
DDP 4 0.0124 0.0635 0.119 0.189 0.233

I'm also setting 1 and 2 GPU to run right now to make sure nothing abnormal happened!

I'm also not sure if rebasing is the best thing to do because we will lose the history of the commits and some are valuable parts like this point on "DDP deterioration`. I think there is an option on github to "squash" commits into one big commit.

@MagicFrogSJTU
Copy link
Contributor Author

DDP 4

@MagicFrogSJTU See table below!

Type Epoch 1 Epoch 2
DDP 4 0.0124 0.0635
I'm also setting 1 and 2 GPU to run right now to make sure nothing abnormal happened!
I'm also not sure if rebasing is the best thing to do because we will lose the history of the commits and some are valuable parts like this point on "DDP deterioration`.

Thanks for your experiments!
Squash is okay as long as the number of commit are reduced! I am not familiar with squash though. I will have a survey.

@NanoCode012
Copy link
Contributor

NanoCode012 commented Jul 16, 2020

@MagicFrogSJTU I think the results are quite clear.
results (1)

f is Magic's feature branch
Number is GPU count.
Batch size 64.
Normal parameters. (without SyncBatch)

Edit: Added 8 GPU

MagicFrogSJTU and others added 2 commits July 16, 2020 16:36
commit d738487
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 17:33:38 2020 +0700

    Adding world_size

    Reduce calls to torch.distributed. For use in create_dataloader.

commit e742dd9
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 15:38:48 2020 +0800

    Make SyncBN a choice

commit e90d400
Merge: 5bf8beb cd90360
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Tue Jul 14 15:32:10 2020 +0800

    Merge pull request #6 from NanoCode012/patch-5

    Update train.py

commit cd90360
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 13:39:29 2020 +0700

    Update train.py

    Remove redundant `opt.` prefix.

commit 5bf8beb
Merge: c9558a9 a1c8406
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 14:09:51 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed

commit c9558a9
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 13:51:34 2020 +0800

    Add device allocation for loss compute

commit 4f08c69
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 9 11:16:27 2020 +0800

    Revert drop_last

commit 1dabe33
Merge: a1ce9b1 4b8450b
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 9 11:15:49 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit a1ce9b1
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 9 11:15:21 2020 +0800

    fix lr warning

commit 4b8450b
Merge: b9a50ae 02c63ef
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Wed Jul 8 21:24:24 2020 +0800

    Merge pull request #4 from NanoCode012/patch-4

    Add drop_last for multi gpu

commit 02c63ef
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Wed Jul 8 10:08:30 2020 +0700

    Add drop_last for multi gpu

commit b9a50ae
Merge: ec2dc6c 121d90b
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 7 19:48:04 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed

commit ec2dc6c
Merge: d0326e3 82a6182
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 7 19:34:31 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit d0326e3
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 7 19:31:24 2020 +0800

    Add SyncBN

commit 82a6182
Merge: 96fa40a 050b2a5
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Tue Jul 7 19:21:01 2020 +0800

    Merge pull request #1 from NanoCode012/patch-2

    Convert BatchNorm to SyncBatchNorm

commit 050b2a5
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 7 12:38:14 2020 +0700

    Add cleanup for process_group

commit 2aa3301
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 7 12:07:40 2020 +0700

    Remove apex.parallel. Use torch.nn.parallel

    For future compatibility

commit 77c8e27
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 7 01:54:39 2020 +0700

    Convert BatchNorm to SyncBatchNorm

commit 96fa40a
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Mon Jul 6 21:53:56 2020 +0800

    Fix the datset inconsistency problem

commit 16e7c26
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Mon Jul 6 11:34:03 2020 +0800

    Add loss multiplication to preserver the single-process performance

commit e838055
Merge: 625bb49 3bdea3f
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Fri Jul 3 20:56:30 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed

commit 625bb49
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 2 22:45:15 2020 +0800

    DDP established
commit 94147314e559a6bdd13cb9de62490d385c27596f
Merge: 65157e2 37acbdc
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 16 14:00:17 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov4 into feature/DDP_fixed

commit 37acbdc
Author: Glenn Jocher <glenn.jocher@ultralytics.com>
Date:   Wed Jul 15 20:03:41 2020 -0700

    update test.py --save-txt

commit b8c2da4
Author: Glenn Jocher <glenn.jocher@ultralytics.com>
Date:   Wed Jul 15 20:00:48 2020 -0700

    update test.py --save-txt

commit 65157e2
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Wed Jul 15 16:44:13 2020 +0800

    Revert the README.md removal

commit 1c802bf
Merge: cd55b44 0f3b8bb
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Wed Jul 15 16:43:38 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit cd55b44
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Wed Jul 15 16:42:33 2020 +0800

    fix the DDP performance deterioration bug.

commit 0f3b8bb
Author: Glenn Jocher <glenn.jocher@ultralytics.com>
Date:   Wed Jul 15 00:28:53 2020 -0700

    Delete README.md

commit f5921ba
Merge: 85ab2f3 bd3fdbb
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Wed Jul 15 11:20:17 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit bd3fdbb
Author: Glenn Jocher <glenn.jocher@ultralytics.com>
Date:   Tue Jul 14 18:38:20 2020 -0700

    Update README.md

commit c1a97a7
Merge: 2bf86b8 f796708
Author: Glenn Jocher <glenn.jocher@ultralytics.com>
Date:   Tue Jul 14 18:36:53 2020 -0700

    Merge branch 'master' into feature/DDP_fixed

commit 2bf86b8
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 22:18:15 2020 +0700

    Fixed world_size not found when called from test

commit 85ab2f3
Merge: 5a19011 c8357ad
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 22:19:58 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit 5a19011
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 22:19:15 2020 +0800

    Add assertion for <=2 gpus DDP

commit c8357ad
Merge: e742dd9 787582f
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Tue Jul 14 22:10:02 2020 +0800

    Merge pull request #8 from MagicFrogSJTU/NanoCode012-patch-1

    Modify number of dataloaders' workers

commit 787582f
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 20:38:58 2020 +0700

    Fixed issue with single gpu not having world_size

commit 6364892
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 19:16:15 2020 +0700

    Add assert message for clarification

    Clarify why assertion was thrown to users

commit 69364d6
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 17:36:48 2020 +0700

    Changed number of workers check

commit d738487
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 17:33:38 2020 +0700

    Adding world_size

    Reduce calls to torch.distributed. For use in create_dataloader.

commit e742dd9
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 15:38:48 2020 +0800

    Make SyncBN a choice

commit e90d400
Merge: 5bf8beb cd90360
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Tue Jul 14 15:32:10 2020 +0800

    Merge pull request #6 from NanoCode012/patch-5

    Update train.py

commit cd90360
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 13:39:29 2020 +0700

    Update train.py

    Remove redundant `opt.` prefix.

commit 5bf8beb
Merge: c9558a9 a1c8406
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 14:09:51 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed

commit c9558a9
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 13:51:34 2020 +0800

    Add device allocation for loss compute

commit 4f08c69
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 9 11:16:27 2020 +0800

    Revert drop_last

commit 1dabe33
Merge: a1ce9b1 4b8450b
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 9 11:15:49 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit a1ce9b1
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 9 11:15:21 2020 +0800

    fix lr warning

commit 4b8450b
Merge: b9a50ae 02c63ef
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Wed Jul 8 21:24:24 2020 +0800

    Merge pull request #4 from NanoCode012/patch-4

    Add drop_last for multi gpu

commit 02c63ef
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Wed Jul 8 10:08:30 2020 +0700

    Add drop_last for multi gpu

commit b9a50ae
Merge: ec2dc6c 121d90b
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 7 19:48:04 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed

commit ec2dc6c
Merge: d0326e3 82a6182
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 7 19:34:31 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit d0326e3
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 7 19:31:24 2020 +0800

    Add SyncBN

commit 82a6182
Merge: 96fa40a 050b2a5
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Tue Jul 7 19:21:01 2020 +0800

    Merge pull request #1 from NanoCode012/patch-2

    Convert BatchNorm to SyncBatchNorm

commit 050b2a5
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 7 12:38:14 2020 +0700

    Add cleanup for process_group

commit 2aa3301
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 7 12:07:40 2020 +0700

    Remove apex.parallel. Use torch.nn.parallel

    For future compatibility

commit 77c8e27
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 7 01:54:39 2020 +0700

    Convert BatchNorm to SyncBatchNorm

commit 96fa40a
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Mon Jul 6 21:53:56 2020 +0800

    Fix the datset inconsistency problem

commit 16e7c26
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Mon Jul 6 11:34:03 2020 +0800

    Add loss multiplication to preserver the single-process performance

commit e838055
Merge: 625bb49 3bdea3f
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Fri Jul 3 20:56:30 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed

commit 625bb49
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 2 22:45:15 2020 +0800

    DDP established
@NanoCode012
Copy link
Contributor

NanoCode012 commented Jul 16, 2020

UnitTest passed for the branch. I added test for DDP training.

set -e 
rm -rf yolov5 && git clone https://github.com/MagicFrogSJTU/yolov5.git -b feature/DDP_fixed && cd yolov5
pip install -qr requirements.txt onnx
python3 -c "from utils.google_utils import *; gdrive_download('1n_oKgR81BJtqk75b00eAjdv03qVCQn2f', 'coco128.zip')" && mv -n ./coco128 ../
export PYTHONPATH="$PWD" # to run *.py. files in subdirectories
for x in yolov5s #yolov5m yolov5l yolov5x # models
do
  python -m torch.distributed.launch --nproc_per_node 2 train.py --weights $x.pt --cfg models/$x.yaml --epochs 3 --img 320 --device 0,1 # DDP train
  for di in 0,1 0 cpu # inference devices
  do
    python train.py --weights $x.pt --cfg models/$x.yaml --epochs 3 --img 320 --device $di  # train
    python detect.py --weights $x.pt --device $di  # detect official
    python detect.py --weights runs/exp0/weights/last.pt --device $di  # detect custom
    python test.py --weights $x.pt --device $di # test official
    python test.py --weights runs/exp0/weights/last.pt --device $di # test custom
  done
  python models/yolo.py --cfg $x.yaml # inspect
  python models/export.py --weights $x.pt --img 640 --batch 1 # export
done

Edit: Add log unittest-log.txt

@MagicFrogSJTU
Copy link
Contributor Author

@glenn-jocher
As @NanoCode012 's experiments show, DDP is now acting normally with arbitrary gpu nums. And Unit test passed. I think maybe it is time to start to merge this PR?

@glenn-jocher
Copy link
Member

@MagicFrogSJTU @NanoCode012 awesome guys, thanks for the updated plots! They look perfect, and unit tests are passing so we are all set. Ok I will look through the updates today!

@glenn-jocher
Copy link
Member

glenn-jocher commented Jul 16, 2020

Ok this is a bit complicated. I'll stop making changes to the affected files to allow time to review them and merge. Of the 5 files updated, test.py changes are actually already reflected in master, those are updates I pushed yesterday to allow autolabeling of datasets using test.py. So it looks like test.py has no changes compared to master, is that right?

UPDATE: In torch_utils there is a pickle import, but I don't see it used anywhere? Also the EMA should now only ever be maintained as a single-gpu model, so is the check on it's DP/DPP status necessary (I haven't looked at train.py yet)?

UPDATE2: the msd DP/DPP check is implemented the current way because it profiles faster than checking for 'module' attributes. I tested 3 ways when I wrote the code, the type() method, the 'module' method, and isinstance() method, and used the fastest. So while the code may take up a bit more space, the current op should be the least expensive.
Screen Shot 2020-07-16 at 1 07 45 PM

@NanoCode012
Copy link
Contributor

NanoCode012 commented Jul 19, 2020

In this case we probably want the function to have a default world_size=1 argument, and simply not supply a world_size through test.py. I'll fix this.

I fixed the world_size bug. Tested it also on 1 GPU train, test, detect. CI covered CPU. Will re-run it fully when my machine is available.

When I put forward this issue, I have an experiment of 8 V100 on default DP mode with master code and got 2x acceleration.

May I ask what dataset you were training on? Did you set any specific parameters? Was it because you increased batchsize?

Here the reason why we got no acceleration at all (I assume your python train.py --data coco.yaml --epochs 2 --batch 64 --device 0,1,2,3 is run with this PR code?) may be the PR code implements the real DP mode (torch.nn.DataParallel). This is the code diff FYI.

I've set DP mode on Magic to test for comparisons.

Edit: Add chart

SyncBN is off. Batch size 64. It would be great if you can duplicate the result for the PR branch. It just seems so unreal. Time is average from 3 epochs.

image

@MagicFrogSJTU, I'm a bit confused a bit when running your branch's DP at different batch sizes (64,128,256) for (2,4,8) GPUs. They all take about 11-12 minutes to run. I was expecting it to be faster. Accuracy also slightly drops at higher GPUs.

@MagicFrogSJTU
Copy link
Contributor Author

May I ask what dataset you were training on? Did you set any specific parameters? Was it because you increased batchsize?

On coco. No. My batch size is 64. It was done long ago, like a month, with master code. Maybe the code have been changed a lot..
I am currently short of machine to train, maybe you can check out the code a month ago and try training?

@MagicFrogSJTU, I'm a bit confused a bit when running your branch's DP at different batch sizes (64,128,256) for (2,4,8) GPUs. They all take about 11-12 minutes to run. I was expecting it to be faster. Accuracy also slightly drops at higher GPUs.

It happens when the batch size is not the key constraint of speed. I assume the data transfer between gpus and the cpu overload are more significant now for DP mode. Accuracy would drop if you run DP on higher GPUs because the batch size per gpu becomes too small. This is why we introduce SyncBN on DDP mode. (By the way, SyncBN is not applicable on DP mode).

@glenn-jocher
Copy link
Member

@MagicFrogSJTU, I'm a bit confused a bit when running your branch's DP at different batch sizes (64,128,256) for (2,4,8) GPUs. They all take about 11-12 minutes to run. I was expecting it to be faster. Accuracy also slightly drops at higher GPUs.

It happens when the batch size is not the key constraint of speed. I assume the data transfer between gpus and the cpu overload are more significant now for DP mode. Accuracy would drop if you run DP on higher GPUs because the batch size per gpu becomes too small. This is why we introduce SyncBN on DDP mode. (By the way, SyncBN is not applicable on DP mode).

Yes, this is the same exact results I found myself for current master. I'm assuming the same thing, that on a T4 the speed is GPU TOPS constrained, but on a V100 that constraint is removed and the new constraint is CPU-GPU communication as well as device 0 tasks that DP is doing. Ok al I have left is to finish reviewing train.py, all other files are good.

@glenn-jocher
Copy link
Member

@MagicFrogSJTU ok I understand about mp.spawn. It's unfortunate that the multi-gpu training process now has a different command, it's a bit more confusing to implement, but it definitely looks like you guys have succeeded in speeding it up greatly, which is the most important result of course.

I think train.py might be able to use a bit of simplification in the future, as it's more complicated to understand now than before, but I'll go ahead and merge this and then we can make tweaks as needed going forward.

Good job guys!!

@glenn-jocher glenn-jocher merged commit 4102fcc into ultralytics:master Jul 19, 2020
@MagicFrogSJTU
Copy link
Contributor Author

@MagicFrogSJTU ok I understand about mp.spawn. It's unfortunate that the multi-gpu training process now has a different command, it's a bit more confusing to implement, but it definitely looks like you guys have succeeded in speeding it up greatly, which is the most important result of course.

I think train.py might be able to use a bit of simplification in the future, as it's more complicated to understand now than before, but I'll go ahead and merge this and then we can make tweaks as needed going forward.

Good job guys!!

Great!

@MagicFrogSJTU
Copy link
Contributor Author

MagicFrogSJTU commented Jul 20, 2020

I have come up with several things to fix

  1. Use mp.spwan.
  2. Replace all "print" to "log" to allow process-0-only screen output.
  3. Speed up train-time val inference by spliting workload between processes rather than only on process-0.

@NanoCode012 Do you have any more ideas in your mind?

Edit 0:

I think train.py might be able to use a bit of simplification in the future, as it's more complicated to understand now than before,

  1. Simplify train.py. (Not an easy task, should have a careful design first.)

@NanoCode012
Copy link
Contributor

The only thing left that comes to mind is to

  1. Add tutorial for DDP commands
  2. Improve readability, opt.distributed=True , to denote their current state rather than local rank.
  3. Use multiple GPUs for test/detect

@MagicFrogSJTU
Copy link
Contributor Author

@NanoCode012

  1. I am not familiar with mp.spawn. I think if you can get this feature done quickly you don't have to do Add tutorial for DDP commands. What's your estimation of its difficulty?
  2. Would you mind giving more explanation please? I am a little confused.

@NanoCode012
Copy link
Contributor

NanoCode012 commented Jul 20, 2020

  1. I've actually already implemented the bones of it in my repo, however, it didn't do a few things like "broadcast weights, adjust loss". That was why it did not work properly. Also, since master repo is about 10-15 days ahead. A lot of conflicts will happen. Will look over it this week.
  2. Nothing fancy. I just meant something like, instead of local_rank in [-1,0], we could do not opt.distributed or not opt.parallel to check. We could have one argument like --distributed to activate ddp mode, else, use dp mode. Just for readability, but a bit more lines of code. This idea is from https://github.com/pytorch/examples/blob/master/imagenet/main.py

Edit: There is one qualm about mp.spawn though. Each time a dataloader is created, it re-calls the entire script (train.py) [anything out of function], this would slightly slow down the code. If there are 8 dataloaders per GPU, that would be a source of slowing down. That was why I loved torch.distributed.launch as it didn't have that issue.

@MagicFrogSJTU
Copy link
Contributor Author

  1. I've actually already implemented the bones of it in my repo, however, it didn't do a few things like "broadcast weights, adjust loss". That was why it did not work properly. Also, since master repo is about 10-15 days ahead. A lot of conflicts will happen. Will look over it this week.
  2. Nothing fancy. I just meant something like, instead of local_rank in [-1,0], we could do not opt.distributed or not opt.parallel to check. We could have one argument like --distributed to activate ddp mode, else, use dp mode. Just for readability, but a bit more lines of code. This idea is from https://github.com/pytorch/examples/blob/master/imagenet/main.py

Edit: There is one qualm about mp.spawn though. Each time a dataloader is created, it re-calls the entire script (train.py) [anything out of function], this would slightly slow down the code. If there are 8 dataloaders per GPU, that would be a source of slowing down. That was why I loved torch.distributed.launch as it didn't have that issue.

  1. local_rank is the requirement of torch.distributed.launch. When torch.distributed.launch invokes the train.py, it will give --local_rank=$rank to transfer the trank information to each of the process. Which means we can't use other key words if lanuch is still utilized. I read https://github.com/pytorch/examples/blob/master/imagenet/main.py, it use mp.spawn.

@NanoCode012
Copy link
Contributor

NanoCode012 commented Jul 20, 2020

local_rank is the requirement of torch.distributed.launch. When torch.distributed.launch invokes the train.py, it will give --local_rank=$rank to transfer the trank information to each of the process. Which means we can't use other key words if lanuch is still utilized. I read https://github.com/pytorch/examples/blob/master/imagenet/main.py, it use mp.spawn.

I understand. I meant that, we can use opt.parallel = True if local_rank >= 0 else False, then work with opt.parallel instead. This is just my idea for readability because someone else could be confused why local_rank is or isn't in [-1,0]. It makes more sense that
if opt.parallel: model=DP(model) than if local_rank == -1 and torch.cuda.devices() > 1 : model = DP(model)

For mp.spawn, we have to add rank as the first parameter in train function, def train(rank, arg0, arg1) instead.

Edit: I want to make clear about something. opt.parallel could mean DP mode or DDP mode. opt.distributed would mean only DDP mode. They could set DDP mode by passing --distributed as a flag.

@MagicFrogSJTU
Copy link
Contributor Author

local_rank is the requirement of torch.distributed.launch. When torch.distributed.launch invokes the train.py, it will give --local_rank=$rank to transfer the trank information to each of the process. Which means we can't use other key words if lanuch is still utilized. I read https://github.com/pytorch/examples/blob/master/imagenet/main.py, it use mp.spawn.

I understand. I meant that, we can use opt.parallel = True if local_rank >= 0 else False, then work with opt.parallel instead. This is just my idea for readability because someone else could be confused why local_rank is or isn't in [-1,0]. It makes more sense that
if opt.parallel: model=DP(model) than if local_rank == -1 and torch.cuda.devices() > 1 : model = DP(model)

For mp.spawn, we have to add rank as the first parameter in train function, def train(rank, arg0, arg1) instead.

Edit: I want to make clear about something. opt.parallel could mean DP mode or DDP mode. opt.distributed would mean only DDP mode. They could set DDP mode by passing --distributed as a flag.

You got a point!

@NanoCode012
Copy link
Contributor

NanoCode012 commented Jul 20, 2020

Hi @glenn-jocher , I have implemented using mp.spawn over the current code. https://github.com/MagicFrogSJTU/yolov5/tree/feature/mp_spawn

However, I'm still testing the speed/accuracy. I'm just giving you a heads up before you start making a tutorial on DDP.
Command would be the same as before, python train.py .. . Add --distributed flag for ddp, and without it for dp.

Also, we should separate/hide the output from different gpus. @MagicFrogSJTU made an interesting point on logging. He suggests to use logging lib to log the outputs at different severity/levels for different GPUs. We would like you opinion on this because it would need to change all the print statements. I'm not sure if it's needed to change it only in train or everywhere for consistency.

@glenn-jocher
Copy link
Member

@NanoCode012 ok got it. No I have not started a tutorial yet, I'm waiting until this settles a bit. But I think before going further you should do a git pull to bring your branch up to speed with the current master (I see 12 ahead 54 behind on your branch). The main complication in merging the last PR was that the code had drifted in the meantime between the two branches, so if you start from the current it will make future PRs much easier. I'll look into the logging idea.

@glenn-jocher
Copy link
Member

glenn-jocher commented Jul 21, 2020

@MagicFrogSJTU @NanoCode012 @alexstoken hi guys. Have a quick update here. I've been retraining the current models (which I'll call yolov5.1) and also training two new architectures, yolov5.2 and yolov5.3. I don't want to confuse everyone with a bunch of new names, but this is the simplest I could think of, and it leaves the door open in the future to more experiments like yolov5.4 etc. Each of the 3 comes in the same sizes as before, i.e. yolov5.1s, yolov5.1m etc.

The baseline yolov5.1 show slight improvements for the larger models, and the other two mainly show improvements for the smaller models, so there is no clear winner in my experiments (5.3 is not 'better' across the board than 5.2 or 5.1 for example, just different architecture compromises). 5.3 and 5.2 are better for small objects, but they are also slower than 5.1 as they introduce more ops on the P2/4 grid.

These models include breaking changes that will make current models incompatible unfortunately, but I think the changes are beneficial for the long term going forward to simplify the architecture a bit. I want to release all of this in about a week, I'm waiting on the final 5x models to finish training.

In the meantime I'm holding off on making changes because I'm not sure if you guys are making a lot of current modifications to your local branches. I think the most important thing you can do right now is to update your current branches to master to streamline any PRs in the future, as most of my holdup when merging is due a lot to confusion about whether commits are old or new etc. It's just an unfortunate side-effect of many people working on the same code region.

This is mainly my fault too of course, for pushing so many commits straight to master randomly throughout the week. In the future I'll try to consolidate my changes into fewer commits, and also open PRs myself to better group commits and push less often.

@NanoCode012
Copy link
Contributor

NanoCode012 commented Jul 21, 2020

Hello glenn, I will update the branch to master by today. My test for launch vs spawn is done. Train time average across 3 epochs.
image
image

Right now, launch seems to be better by a small margin across the board.

@glenn-jocher
Copy link
Member

@NanoCode012 oh wow, this is great work, good job! Yes it looks like launch is providing faster times, interesting. Well that's unfortunate then, maybe we should stick with the current work and simply try to clean up train.py a bit to make it more readable. What do you think?

I think your N4 and N8 experiments are showing the same times because the GPUs ops are no longer constraining the speed at that point, something else must be the bottleneck there, likely reading images from the hard drive, or moving data from cpu-gpu. For larger models, like yolov5l and up I think you'll probably get a more similar curve to what you'd expect, with N8 showing speed improvements compared to N4. 300 seconds for a COCO epoch is just insanely fast in any case.

The ultimate training speed would be N8 with train.py --cache, as all of the images would be preloaded into ram, removing the hard drive read speed constraint from the picture. At img-size 640 though for COCO this requires about 150 GB of system RAM, so it's not quite feasible with today's hardware. For smaller datasets though, this is quite feasible and makes a huge training speed difference.

@MagicFrogSJTU MagicFrogSJTU changed the title [WIP] Feature/ddp fixed Feature/ddp fixed Jul 21, 2020
NanoCode012 added a commit to MagicFrogSJTU/yolov5 that referenced this pull request Jul 21, 2020
* update test.py --save-txt

* update test.py --save-txt

* add GH action tests

* requirements

* requirements

* requirements

* fix tests

* add badge

* lower batch-size

* weights

* args

* parallel

* rename eval

* rename eval

* paths

* rename

* lower bs

* timeout

* less xOS

* drop xOS

* git attrib

* paths

* paths

* Apply suggestions from code review

* Update eval.py

* Update eval.py

* update requirements.txt

* Update ci-testing.yml

* Update ci-testing.yml

* rename test

* revert test module to confuse users...

* update hubconf.py

* update common.py add Classify()

* Update ci-testing.yml

* Update ci-testing.yml

* Update ci-testing.yml

* Update ci-testing.yml

* update common.py Classify()

* Update ci-testing.yml

* update test.py

* update train.py ckpt loading

* update train.py class count assertion ultralytics#424

* update train.py class count assertion ultralytics#424

Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Update requirements.txt

* [WIP] Feature/ddp fixed (ultralytics#401)

* Squashed commit of the following:

commit d738487
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 17:33:38 2020 +0700

    Adding world_size

    Reduce calls to torch.distributed. For use in create_dataloader.

commit e742dd9
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 15:38:48 2020 +0800

    Make SyncBN a choice

commit e90d400
Merge: 5bf8beb cd90360
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Tue Jul 14 15:32:10 2020 +0800

    Merge pull request #6 from NanoCode012/patch-5

    Update train.py

commit cd90360
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 13:39:29 2020 +0700

    Update train.py

    Remove redundant `opt.` prefix.

commit 5bf8beb
Merge: c9558a9 a1c8406
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 14:09:51 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed

commit c9558a9
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 13:51:34 2020 +0800

    Add device allocation for loss compute

commit 4f08c69
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 9 11:16:27 2020 +0800

    Revert drop_last

commit 1dabe33
Merge: a1ce9b1 4b8450b
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 9 11:15:49 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit a1ce9b1
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 9 11:15:21 2020 +0800

    fix lr warning

commit 4b8450b
Merge: b9a50ae 02c63ef
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Wed Jul 8 21:24:24 2020 +0800

    Merge pull request #4 from NanoCode012/patch-4

    Add drop_last for multi gpu

commit 02c63ef
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Wed Jul 8 10:08:30 2020 +0700

    Add drop_last for multi gpu

commit b9a50ae
Merge: ec2dc6c 121d90b
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 7 19:48:04 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed

commit ec2dc6c
Merge: d0326e3 82a6182
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 7 19:34:31 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit d0326e3
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 7 19:31:24 2020 +0800

    Add SyncBN

commit 82a6182
Merge: 96fa40a 050b2a5
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Tue Jul 7 19:21:01 2020 +0800

    Merge pull request #1 from NanoCode012/patch-2

    Convert BatchNorm to SyncBatchNorm

commit 050b2a5
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 7 12:38:14 2020 +0700

    Add cleanup for process_group

commit 2aa3301
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 7 12:07:40 2020 +0700

    Remove apex.parallel. Use torch.nn.parallel

    For future compatibility

commit 77c8e27
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 7 01:54:39 2020 +0700

    Convert BatchNorm to SyncBatchNorm

commit 96fa40a
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Mon Jul 6 21:53:56 2020 +0800

    Fix the datset inconsistency problem

commit 16e7c26
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Mon Jul 6 11:34:03 2020 +0800

    Add loss multiplication to preserver the single-process performance

commit e838055
Merge: 625bb49 3bdea3f
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Fri Jul 3 20:56:30 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed

commit 625bb49
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 2 22:45:15 2020 +0800

    DDP established

* Squashed commit of the following:

commit 94147314e559a6bdd13cb9de62490d385c27596f
Merge: 65157e2 37acbdc
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 16 14:00:17 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov4 into feature/DDP_fixed

commit 37acbdc
Author: Glenn Jocher <glenn.jocher@ultralytics.com>
Date:   Wed Jul 15 20:03:41 2020 -0700

    update test.py --save-txt

commit b8c2da4
Author: Glenn Jocher <glenn.jocher@ultralytics.com>
Date:   Wed Jul 15 20:00:48 2020 -0700

    update test.py --save-txt

commit 65157e2
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Wed Jul 15 16:44:13 2020 +0800

    Revert the README.md removal

commit 1c802bf
Merge: cd55b44 0f3b8bb
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Wed Jul 15 16:43:38 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit cd55b44
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Wed Jul 15 16:42:33 2020 +0800

    fix the DDP performance deterioration bug.

commit 0f3b8bb
Author: Glenn Jocher <glenn.jocher@ultralytics.com>
Date:   Wed Jul 15 00:28:53 2020 -0700

    Delete README.md

commit f5921ba
Merge: 85ab2f3 bd3fdbb
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Wed Jul 15 11:20:17 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit bd3fdbb
Author: Glenn Jocher <glenn.jocher@ultralytics.com>
Date:   Tue Jul 14 18:38:20 2020 -0700

    Update README.md

commit c1a97a7
Merge: 2bf86b8 f796708
Author: Glenn Jocher <glenn.jocher@ultralytics.com>
Date:   Tue Jul 14 18:36:53 2020 -0700

    Merge branch 'master' into feature/DDP_fixed

commit 2bf86b8
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 22:18:15 2020 +0700

    Fixed world_size not found when called from test

commit 85ab2f3
Merge: 5a19011 c8357ad
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 22:19:58 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit 5a19011
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 22:19:15 2020 +0800

    Add assertion for <=2 gpus DDP

commit c8357ad
Merge: e742dd9 787582f
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Tue Jul 14 22:10:02 2020 +0800

    Merge pull request #8 from MagicFrogSJTU/NanoCode012-patch-1

    Modify number of dataloaders' workers

commit 787582f
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 20:38:58 2020 +0700

    Fixed issue with single gpu not having world_size

commit 6364892
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 19:16:15 2020 +0700

    Add assert message for clarification

    Clarify why assertion was thrown to users

commit 69364d6
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 17:36:48 2020 +0700

    Changed number of workers check

commit d738487
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 17:33:38 2020 +0700

    Adding world_size

    Reduce calls to torch.distributed. For use in create_dataloader.

commit e742dd9
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 15:38:48 2020 +0800

    Make SyncBN a choice

commit e90d400
Merge: 5bf8beb cd90360
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Tue Jul 14 15:32:10 2020 +0800

    Merge pull request #6 from NanoCode012/patch-5

    Update train.py

commit cd90360
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 13:39:29 2020 +0700

    Update train.py

    Remove redundant `opt.` prefix.

commit 5bf8beb
Merge: c9558a9 a1c8406
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 14:09:51 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed

commit c9558a9
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 13:51:34 2020 +0800

    Add device allocation for loss compute

commit 4f08c69
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 9 11:16:27 2020 +0800

    Revert drop_last

commit 1dabe33
Merge: a1ce9b1 4b8450b
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 9 11:15:49 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit a1ce9b1
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 9 11:15:21 2020 +0800

    fix lr warning

commit 4b8450b
Merge: b9a50ae 02c63ef
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Wed Jul 8 21:24:24 2020 +0800

    Merge pull request #4 from NanoCode012/patch-4

    Add drop_last for multi gpu

commit 02c63ef
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Wed Jul 8 10:08:30 2020 +0700

    Add drop_last for multi gpu

commit b9a50ae
Merge: ec2dc6c 121d90b
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 7 19:48:04 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed

commit ec2dc6c
Merge: d0326e3 82a6182
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 7 19:34:31 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit d0326e3
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 7 19:31:24 2020 +0800

    Add SyncBN

commit 82a6182
Merge: 96fa40a 050b2a5
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Tue Jul 7 19:21:01 2020 +0800

    Merge pull request #1 from NanoCode012/patch-2

    Convert BatchNorm to SyncBatchNorm

commit 050b2a5
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 7 12:38:14 2020 +0700

    Add cleanup for process_group

commit 2aa3301
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 7 12:07:40 2020 +0700

    Remove apex.parallel. Use torch.nn.parallel

    For future compatibility

commit 77c8e27
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 7 01:54:39 2020 +0700

    Convert BatchNorm to SyncBatchNorm

commit 96fa40a
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Mon Jul 6 21:53:56 2020 +0800

    Fix the datset inconsistency problem

commit 16e7c26
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Mon Jul 6 11:34:03 2020 +0800

    Add loss multiplication to preserver the single-process performance

commit e838055
Merge: 625bb49 3bdea3f
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Fri Jul 3 20:56:30 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed

commit 625bb49
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 2 22:45:15 2020 +0800

    DDP established

* Fixed destroy_process_group in DP mode

* Update torch_utils.py

* Update utils.py

Revert build_targets() to current master.

* Update datasets.py

* Fixed world_size attribute not found

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Update ci-testing.yml (ultralytics#445)

* Update ci-testing.yml

* Update ci-testing.yml

* Update requirements.txt

* Update requirements.txt

* Update google_utils.py

* Update test.py

* Update ci-testing.yml

* pretrained model loading bug fix (ultralytics#450)

Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Update datasets.py (ultralytics#454)

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: yzchen <Chenyzsjtu@gmail.com>
Co-authored-by: pritul dave <41751718+pritul2@users.noreply.github.com>
@MagicFrogSJTU
Copy link
Contributor Author

@NanoCode012
Have you looked into the difference between mp.spawn and launch laterly?
It seems they should have the same speed efficiency. What do you think the reason of mp.spawn's slowing speed might originate from?

@NanoCode012
Copy link
Contributor

@MagicFrogSJTU , I haven’t kept up with any new in Pytorch 1.6 DDP if there are any.

The reason I think it slows is during create_dataloaders. Each gpu creates N workers. Each worker would call the entire train.py . You can test it out by adding a print statement in global for train.py using my mp-spawn branch. 2 GPU would mean 16 workers..

@NanoCode012
Copy link
Contributor

Also, I was just told that launch doesn't work on Windows. If it's possible, I would like to add spawn.

@MagicFrogSJTU
Copy link
Contributor Author

MagicFrogSJTU commented Aug 10, 2020

@MagicFrogSJTU , I haven’t kept up with any new in Pytorch 1.6 DDP if there are any.

The reason I think it slows is during create_dataloaders. Each gpu creates N workers. Each worker would call the entire train.py . You can test it out by adding a print statement in global for train.py using my mp-spawn branch. 2 GPU would mean 16 workers..

I read the official document. These two are expected to be equal.
Can you please give me the URL of your implementation? I wanna have a check.

And what's the meaning of Each worker would call the entire train.py .? Can you give more details?

@NanoCode012
Copy link
Contributor

NanoCode012 commented Aug 10, 2020

Sorry, typing from mobile.

It means each worker from the dataloaders(we pass nw to Dataloaders) calls the train.py file. It would call all the imports and redefine all the function. That's why it was necessary to encapsulate all the global variables into functions. There was a note about this on Pytorch, but I cannot find it now.

You can test the above by adding a simple print("global") on the global scope(above def train) to count how many calls happen.

I hope this is clearer.

The branch can be found in your fork called mp_spawn.

Edit: The guide on pytorch has been updated. Maybe there could be something we could use.

BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this pull request Aug 26, 2022
* Squashed commit of the following:

commit d738487
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 17:33:38 2020 +0700

    Adding world_size

    Reduce calls to torch.distributed. For use in create_dataloader.

commit e742dd9
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 15:38:48 2020 +0800

    Make SyncBN a choice

commit e90d400
Merge: 5bf8beb cd90360
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Tue Jul 14 15:32:10 2020 +0800

    Merge pull request #6 from NanoCode012/patch-5

    Update train.py

commit cd90360
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 13:39:29 2020 +0700

    Update train.py

    Remove redundant `opt.` prefix.

commit 5bf8beb
Merge: c9558a9 880d072
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 14:09:51 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed

commit c9558a9
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 13:51:34 2020 +0800

    Add device allocation for loss compute

commit 4f08c69
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 9 11:16:27 2020 +0800

    Revert drop_last

commit 1dabe33
Merge: a1ce9b1 4b8450b
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 9 11:15:49 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit a1ce9b1
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 9 11:15:21 2020 +0800

    fix lr warning

commit 4b8450b
Merge: b9a50ae 02c63ef
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Wed Jul 8 21:24:24 2020 +0800

    Merge pull request #4 from NanoCode012/patch-4

    Add drop_last for multi gpu

commit 02c63ef
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Wed Jul 8 10:08:30 2020 +0700

    Add drop_last for multi gpu

commit b9a50ae
Merge: ec2dc6c 86e7142
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 7 19:48:04 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed

commit ec2dc6c
Merge: d0326e3 82a6182
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 7 19:34:31 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit d0326e3
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 7 19:31:24 2020 +0800

    Add SyncBN

commit 82a6182
Merge: 96fa40a 050b2a5
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Tue Jul 7 19:21:01 2020 +0800

    Merge pull request #1 from NanoCode012/patch-2

    Convert BatchNorm to SyncBatchNorm

commit 050b2a5
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 7 12:38:14 2020 +0700

    Add cleanup for process_group

commit 2aa3301
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 7 12:07:40 2020 +0700

    Remove apex.parallel. Use torch.nn.parallel

    For future compatibility

commit 77c8e27
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 7 01:54:39 2020 +0700

    Convert BatchNorm to SyncBatchNorm

commit 96fa40a
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Mon Jul 6 21:53:56 2020 +0800

    Fix the datset inconsistency problem

commit 16e7c26
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Mon Jul 6 11:34:03 2020 +0800

    Add loss multiplication to preserver the single-process performance

commit e838055
Merge: 625bb49 31a9f25
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Fri Jul 3 20:56:30 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed

commit 625bb49
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 2 22:45:15 2020 +0800

    DDP established

* Squashed commit of the following:

commit 94147314e559a6bdd13cb9de62490d385c27596f
Merge: 65157e2 9de5a7a
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 16 14:00:17 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov4 into feature/DDP_fixed

commit 9de5a7a
Author: Glenn Jocher <glenn.jocher@ultralytics.com>
Date:   Wed Jul 15 20:03:41 2020 -0700

    update test.py --save-txt

commit 825e729
Author: Glenn Jocher <glenn.jocher@ultralytics.com>
Date:   Wed Jul 15 20:00:48 2020 -0700

    update test.py --save-txt

commit 65157e2
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Wed Jul 15 16:44:13 2020 +0800

    Revert the README.md removal

commit 1c802bf
Merge: cd55b44 0f3b8bb
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Wed Jul 15 16:43:38 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit cd55b44
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Wed Jul 15 16:42:33 2020 +0800

    fix the DDP performance deterioration bug.

commit 0f3b8bb
Author: Glenn Jocher <glenn.jocher@ultralytics.com>
Date:   Wed Jul 15 00:28:53 2020 -0700

    Delete README.md

commit f5921ba
Merge: 85ab2f3 bd3fdbb
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Wed Jul 15 11:20:17 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit bd3fdbb
Author: Glenn Jocher <glenn.jocher@ultralytics.com>
Date:   Tue Jul 14 18:38:20 2020 -0700

    Update README.md

commit c1a97a7
Merge: 2bf86b8 7d73bfb
Author: Glenn Jocher <glenn.jocher@ultralytics.com>
Date:   Tue Jul 14 18:36:53 2020 -0700

    Merge branch 'master' into feature/DDP_fixed

commit 2bf86b8
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 22:18:15 2020 +0700

    Fixed world_size not found when called from test

commit 85ab2f3
Merge: 5a19011 c8357ad
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 22:19:58 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit 5a19011
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 22:19:15 2020 +0800

    Add assertion for <=2 gpus DDP

commit c8357ad
Merge: e742dd9 787582f
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Tue Jul 14 22:10:02 2020 +0800

    Merge pull request ultralytics#8 from MagicFrogSJTU/NanoCode012-patch-1

    Modify number of dataloaders' workers

commit 787582f
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 20:38:58 2020 +0700

    Fixed issue with single gpu not having world_size

commit 6364892
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 19:16:15 2020 +0700

    Add assert message for clarification

    Clarify why assertion was thrown to users

commit 69364d6
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 17:36:48 2020 +0700

    Changed number of workers check

commit d738487
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 17:33:38 2020 +0700

    Adding world_size

    Reduce calls to torch.distributed. For use in create_dataloader.

commit e742dd9
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 15:38:48 2020 +0800

    Make SyncBN a choice

commit e90d400
Merge: 5bf8beb cd90360
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Tue Jul 14 15:32:10 2020 +0800

    Merge pull request #6 from NanoCode012/patch-5

    Update train.py

commit cd90360
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 14 13:39:29 2020 +0700

    Update train.py

    Remove redundant `opt.` prefix.

commit 5bf8beb
Merge: c9558a9 880d072
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 14:09:51 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed

commit c9558a9
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 14 13:51:34 2020 +0800

    Add device allocation for loss compute

commit 4f08c69
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 9 11:16:27 2020 +0800

    Revert drop_last

commit 1dabe33
Merge: a1ce9b1 4b8450b
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 9 11:15:49 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit a1ce9b1
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 9 11:15:21 2020 +0800

    fix lr warning

commit 4b8450b
Merge: b9a50ae 02c63ef
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Wed Jul 8 21:24:24 2020 +0800

    Merge pull request #4 from NanoCode012/patch-4

    Add drop_last for multi gpu

commit 02c63ef
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Wed Jul 8 10:08:30 2020 +0700

    Add drop_last for multi gpu

commit b9a50ae
Merge: ec2dc6c 86e7142
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 7 19:48:04 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed

commit ec2dc6c
Merge: d0326e3 82a6182
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 7 19:34:31 2020 +0800

    Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed

commit d0326e3
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Tue Jul 7 19:31:24 2020 +0800

    Add SyncBN

commit 82a6182
Merge: 96fa40a 050b2a5
Author: yzchen <Chenyzsjtu@gmail.com>
Date:   Tue Jul 7 19:21:01 2020 +0800

    Merge pull request #1 from NanoCode012/patch-2

    Convert BatchNorm to SyncBatchNorm

commit 050b2a5
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 7 12:38:14 2020 +0700

    Add cleanup for process_group

commit 2aa3301
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 7 12:07:40 2020 +0700

    Remove apex.parallel. Use torch.nn.parallel

    For future compatibility

commit 77c8e27
Author: NanoCode012 <kevinvong@rocketmail.com>
Date:   Tue Jul 7 01:54:39 2020 +0700

    Convert BatchNorm to SyncBatchNorm

commit 96fa40a
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Mon Jul 6 21:53:56 2020 +0800

    Fix the datset inconsistency problem

commit 16e7c26
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Mon Jul 6 11:34:03 2020 +0800

    Add loss multiplication to preserver the single-process performance

commit e838055
Merge: 625bb49 31a9f25
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Fri Jul 3 20:56:30 2020 +0800

    Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed

commit 625bb49
Author: yizhi.chen <chenyzsjtu@outlook.com>
Date:   Thu Jul 2 22:45:15 2020 +0800

    DDP established

* Fixed destroy_process_group in DP mode

* Update torch_utils.py

* Update utils.py

Revert build_targets() to current master.

* Update datasets.py

* Fixed world_size attribute not found

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants