You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I added PR for your convenience. This is minimal needed to change. Tested working on my unit test. If you want to keep image sizes near the dataloaders, it "may be" possible to move them above the DP/DDP wrappers, but I'm not sure.
I think having a CI or unit test in DDP/DP mode should be important as it's easy to miss bugs like these. Of course, I understand that resources are expensive.
On a side note, this would be how to do pretrain right? Just pass the weights yolov5s.pt without the yolov5s.yaml file?
Yes, single and multi-GPU CI would be awesome. It's a very rare use-case though, so I think there is only one company offering support for it, which charges hourly. Alternatively I think Github actions can use self hosted runners that you can point to a cloud instance. This article just appeared a few days ago: https://github.blog/2020-08-04-github-actions-self-hosted-runners-on-google-cloud/
If this could spin up a 2x K80 GPU VM (the cheapest and slowest GPUs on GCP), then we could run additional CI tests on linux at least on single and double GPU, and then immediately shut it down afterwards, the costs should be manageable.
But the blog post also notes:
⚠️ Note that these use cases are considered experimental and not officially supported by GitHub at this time. Additionally, it’s recommended not to use self-hosted runners on public repositories for a number of security reasons.
@NanoCode012 oh about your other question, yes, now we can 'finetune', or start training from pretrained weights just by supplying the --weights, the --cfg is no longer required.
If you pass both a --cfg and --weights, the --cfg is used to create a model, and then any matching layers are transferred from the --weights. The anchors are on an exclude list of layers not to transfer, but I need to review this for the --resume use case.
Also the hyps are now in their own file in data/hyp.yaml. If pretrained weights are supplied then the finetuning hyps are used. If no pretrained weights are supplied then the from-scratch hyps are used. If you supply your own --hyp those are used instead. They two hyp files are identical for now, but may change in the future.
🐛 Bug
Due to latest update on 3c6e2f7 , DP and DDP mode would error because they wrap around the model, so the attribute
stride
cannot be accessed.To Reproduce (REQUIRED)
Input:
Output in DDP mode (DP mode output is just a bit different):
Expected behavior
Run like Single GPU mode
Environment
Additional context
Solution is to move the line below, above DP/ DDP wrappers, particularly Line 144.
yolov5/train.py
Lines 143 to 145 in a0ac5ad
to line 126
yolov5/train.py
Lines 126 to 129 in a0ac5ad
I added PR for your convenience. This is minimal needed to change. Tested working on my unit test. If you want to keep
image sizes
near the dataloaders, it "may be" possible to move them above the DP/DDP wrappers, but I'm not sure.I think having a CI or unit test in DDP/DP mode should be important as it's easy to miss bugs like these. Of course, I understand that resources are expensive.
On a side note, this would be how to do pretrain right? Just pass the weights
yolov5s.pt
without theyolov5s.yaml
file?My Unit test (includes DDP/DP mode)
The text was updated successfully, but these errors were encountered: