-
-
Notifications
You must be signed in to change notification settings - Fork 16.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pre-trained weights not happy, map0.5 plateauing #450
Comments
Hello @edeyejedi, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments. If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you. If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:
For more information please visit https://www.ultralytics.com. |
I can add more if requiredL |
Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>
@edeyejedi thanks for raising the issue. We were able to reproduce your bug by trying to train on voc from pretrained weights: python train.py --batch 32 --cfg yolov5s.yaml --weights yolov5s.pt --data voc.yaml --img 512 --nosave We've recently pushed a fix, so please try again with the latest code! In terms of training, start from https://github.com/ultralytics/yolov5#training and see the custom training tutorial. https://docs.ultralytics.com/yolov5/tutorials/train_custom_data I don't have time to get into details, but one thing that clearly sticks out is that batch 8 is far too small. We've never given guidance to use small batches like this. |
Thank you Glenn. |
* update test.py --save-txt * update test.py --save-txt * add GH action tests * requirements * requirements * requirements * fix tests * add badge * lower batch-size * weights * args * parallel * rename eval * rename eval * paths * rename * lower bs * timeout * less xOS * drop xOS * git attrib * paths * paths * Apply suggestions from code review * Update eval.py * Update eval.py * update requirements.txt * Update ci-testing.yml * Update ci-testing.yml * rename test * revert test module to confuse users... * update hubconf.py * update common.py add Classify() * Update ci-testing.yml * Update ci-testing.yml * Update ci-testing.yml * Update ci-testing.yml * update common.py Classify() * Update ci-testing.yml * update test.py * update train.py ckpt loading * update train.py class count assertion ultralytics#424 * update train.py class count assertion ultralytics#424 Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com> * Update requirements.txt * [WIP] Feature/ddp fixed (ultralytics#401) * Squashed commit of the following: commit d738487 Author: NanoCode012 <kevinvong@rocketmail.com> Date: Tue Jul 14 17:33:38 2020 +0700 Adding world_size Reduce calls to torch.distributed. For use in create_dataloader. commit e742dd9 Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Tue Jul 14 15:38:48 2020 +0800 Make SyncBN a choice commit e90d400 Merge: 5bf8beb cd90360 Author: yzchen <Chenyzsjtu@gmail.com> Date: Tue Jul 14 15:32:10 2020 +0800 Merge pull request #6 from NanoCode012/patch-5 Update train.py commit cd90360 Author: NanoCode012 <kevinvong@rocketmail.com> Date: Tue Jul 14 13:39:29 2020 +0700 Update train.py Remove redundant `opt.` prefix. commit 5bf8beb Merge: c9558a9 a1c8406 Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Tue Jul 14 14:09:51 2020 +0800 Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed commit c9558a9 Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Tue Jul 14 13:51:34 2020 +0800 Add device allocation for loss compute commit 4f08c69 Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Thu Jul 9 11:16:27 2020 +0800 Revert drop_last commit 1dabe33 Merge: a1ce9b1 4b8450b Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Thu Jul 9 11:15:49 2020 +0800 Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed commit a1ce9b1 Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Thu Jul 9 11:15:21 2020 +0800 fix lr warning commit 4b8450b Merge: b9a50ae 02c63ef Author: yzchen <Chenyzsjtu@gmail.com> Date: Wed Jul 8 21:24:24 2020 +0800 Merge pull request #4 from NanoCode012/patch-4 Add drop_last for multi gpu commit 02c63ef Author: NanoCode012 <kevinvong@rocketmail.com> Date: Wed Jul 8 10:08:30 2020 +0700 Add drop_last for multi gpu commit b9a50ae Merge: ec2dc6c 121d90b Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Tue Jul 7 19:48:04 2020 +0800 Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed commit ec2dc6c Merge: d0326e3 82a6182 Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Tue Jul 7 19:34:31 2020 +0800 Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed commit d0326e3 Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Tue Jul 7 19:31:24 2020 +0800 Add SyncBN commit 82a6182 Merge: 96fa40a 050b2a5 Author: yzchen <Chenyzsjtu@gmail.com> Date: Tue Jul 7 19:21:01 2020 +0800 Merge pull request #1 from NanoCode012/patch-2 Convert BatchNorm to SyncBatchNorm commit 050b2a5 Author: NanoCode012 <kevinvong@rocketmail.com> Date: Tue Jul 7 12:38:14 2020 +0700 Add cleanup for process_group commit 2aa3301 Author: NanoCode012 <kevinvong@rocketmail.com> Date: Tue Jul 7 12:07:40 2020 +0700 Remove apex.parallel. Use torch.nn.parallel For future compatibility commit 77c8e27 Author: NanoCode012 <kevinvong@rocketmail.com> Date: Tue Jul 7 01:54:39 2020 +0700 Convert BatchNorm to SyncBatchNorm commit 96fa40a Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Mon Jul 6 21:53:56 2020 +0800 Fix the datset inconsistency problem commit 16e7c26 Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Mon Jul 6 11:34:03 2020 +0800 Add loss multiplication to preserver the single-process performance commit e838055 Merge: 625bb49 3bdea3f Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Fri Jul 3 20:56:30 2020 +0800 Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed commit 625bb49 Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Thu Jul 2 22:45:15 2020 +0800 DDP established * Squashed commit of the following: commit 94147314e559a6bdd13cb9de62490d385c27596f Merge: 65157e2 37acbdc Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Thu Jul 16 14:00:17 2020 +0800 Merge branch 'master' of https://github.com/ultralytics/yolov4 into feature/DDP_fixed commit 37acbdc Author: Glenn Jocher <glenn.jocher@ultralytics.com> Date: Wed Jul 15 20:03:41 2020 -0700 update test.py --save-txt commit b8c2da4 Author: Glenn Jocher <glenn.jocher@ultralytics.com> Date: Wed Jul 15 20:00:48 2020 -0700 update test.py --save-txt commit 65157e2 Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Wed Jul 15 16:44:13 2020 +0800 Revert the README.md removal commit 1c802bf Merge: cd55b44 0f3b8bb Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Wed Jul 15 16:43:38 2020 +0800 Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed commit cd55b44 Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Wed Jul 15 16:42:33 2020 +0800 fix the DDP performance deterioration bug. commit 0f3b8bb Author: Glenn Jocher <glenn.jocher@ultralytics.com> Date: Wed Jul 15 00:28:53 2020 -0700 Delete README.md commit f5921ba Merge: 85ab2f3 bd3fdbb Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Wed Jul 15 11:20:17 2020 +0800 Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed commit bd3fdbb Author: Glenn Jocher <glenn.jocher@ultralytics.com> Date: Tue Jul 14 18:38:20 2020 -0700 Update README.md commit c1a97a7 Merge: 2bf86b8 f796708 Author: Glenn Jocher <glenn.jocher@ultralytics.com> Date: Tue Jul 14 18:36:53 2020 -0700 Merge branch 'master' into feature/DDP_fixed commit 2bf86b8 Author: NanoCode012 <kevinvong@rocketmail.com> Date: Tue Jul 14 22:18:15 2020 +0700 Fixed world_size not found when called from test commit 85ab2f3 Merge: 5a19011 c8357ad Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Tue Jul 14 22:19:58 2020 +0800 Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed commit 5a19011 Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Tue Jul 14 22:19:15 2020 +0800 Add assertion for <=2 gpus DDP commit c8357ad Merge: e742dd9 787582f Author: yzchen <Chenyzsjtu@gmail.com> Date: Tue Jul 14 22:10:02 2020 +0800 Merge pull request #8 from MagicFrogSJTU/NanoCode012-patch-1 Modify number of dataloaders' workers commit 787582f Author: NanoCode012 <kevinvong@rocketmail.com> Date: Tue Jul 14 20:38:58 2020 +0700 Fixed issue with single gpu not having world_size commit 6364892 Author: NanoCode012 <kevinvong@rocketmail.com> Date: Tue Jul 14 19:16:15 2020 +0700 Add assert message for clarification Clarify why assertion was thrown to users commit 69364d6 Author: NanoCode012 <kevinvong@rocketmail.com> Date: Tue Jul 14 17:36:48 2020 +0700 Changed number of workers check commit d738487 Author: NanoCode012 <kevinvong@rocketmail.com> Date: Tue Jul 14 17:33:38 2020 +0700 Adding world_size Reduce calls to torch.distributed. For use in create_dataloader. commit e742dd9 Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Tue Jul 14 15:38:48 2020 +0800 Make SyncBN a choice commit e90d400 Merge: 5bf8beb cd90360 Author: yzchen <Chenyzsjtu@gmail.com> Date: Tue Jul 14 15:32:10 2020 +0800 Merge pull request #6 from NanoCode012/patch-5 Update train.py commit cd90360 Author: NanoCode012 <kevinvong@rocketmail.com> Date: Tue Jul 14 13:39:29 2020 +0700 Update train.py Remove redundant `opt.` prefix. commit 5bf8beb Merge: c9558a9 a1c8406 Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Tue Jul 14 14:09:51 2020 +0800 Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed commit c9558a9 Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Tue Jul 14 13:51:34 2020 +0800 Add device allocation for loss compute commit 4f08c69 Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Thu Jul 9 11:16:27 2020 +0800 Revert drop_last commit 1dabe33 Merge: a1ce9b1 4b8450b Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Thu Jul 9 11:15:49 2020 +0800 Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed commit a1ce9b1 Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Thu Jul 9 11:15:21 2020 +0800 fix lr warning commit 4b8450b Merge: b9a50ae 02c63ef Author: yzchen <Chenyzsjtu@gmail.com> Date: Wed Jul 8 21:24:24 2020 +0800 Merge pull request #4 from NanoCode012/patch-4 Add drop_last for multi gpu commit 02c63ef Author: NanoCode012 <kevinvong@rocketmail.com> Date: Wed Jul 8 10:08:30 2020 +0700 Add drop_last for multi gpu commit b9a50ae Merge: ec2dc6c 121d90b Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Tue Jul 7 19:48:04 2020 +0800 Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed commit ec2dc6c Merge: d0326e3 82a6182 Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Tue Jul 7 19:34:31 2020 +0800 Merge branch 'feature/DDP_fixed' of https://github.com/MagicFrogSJTU/yolov5 into feature/DDP_fixed commit d0326e3 Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Tue Jul 7 19:31:24 2020 +0800 Add SyncBN commit 82a6182 Merge: 96fa40a 050b2a5 Author: yzchen <Chenyzsjtu@gmail.com> Date: Tue Jul 7 19:21:01 2020 +0800 Merge pull request #1 from NanoCode012/patch-2 Convert BatchNorm to SyncBatchNorm commit 050b2a5 Author: NanoCode012 <kevinvong@rocketmail.com> Date: Tue Jul 7 12:38:14 2020 +0700 Add cleanup for process_group commit 2aa3301 Author: NanoCode012 <kevinvong@rocketmail.com> Date: Tue Jul 7 12:07:40 2020 +0700 Remove apex.parallel. Use torch.nn.parallel For future compatibility commit 77c8e27 Author: NanoCode012 <kevinvong@rocketmail.com> Date: Tue Jul 7 01:54:39 2020 +0700 Convert BatchNorm to SyncBatchNorm commit 96fa40a Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Mon Jul 6 21:53:56 2020 +0800 Fix the datset inconsistency problem commit 16e7c26 Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Mon Jul 6 11:34:03 2020 +0800 Add loss multiplication to preserver the single-process performance commit e838055 Merge: 625bb49 3bdea3f Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Fri Jul 3 20:56:30 2020 +0800 Merge branch 'master' of https://github.com/ultralytics/yolov5 into feature/DDP_fixed commit 625bb49 Author: yizhi.chen <chenyzsjtu@outlook.com> Date: Thu Jul 2 22:45:15 2020 +0800 DDP established * Fixed destroy_process_group in DP mode * Update torch_utils.py * Update utils.py Revert build_targets() to current master. * Update datasets.py * Fixed world_size attribute not found Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com> * Update ci-testing.yml (ultralytics#445) * Update ci-testing.yml * Update ci-testing.yml * Update requirements.txt * Update requirements.txt * Update google_utils.py * Update test.py * Update ci-testing.yml * pretrained model loading bug fix (ultralytics#450) Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com> * Update datasets.py (ultralytics#454) Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: yzchen <Chenyzsjtu@gmail.com> Co-authored-by: pritul dave <41751718+pritul2@users.noreply.github.com>
Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Hi there,
First ever post from a somewhat inexperienced person in terms of ML (but not a total novice).
I am being cheeky and using 1 post to address 2 problems I am having. The first is priority.
Traceback (most recent call last):
File "train.py", line 474, in
train(hyp, tb_writer, opt, device)
File "train.py", line 135, in train
model.load_state_dict(ckpt['model'], strict=False)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 847, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Model:
size mismatch for model.18.weight: copying a param with shape torch.Size([255, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([18, 128, 1, 1]).
size mismatch for model.18.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([18]).
size mismatch for model.22.weight: copying a param with shape torch.Size([255, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([18, 256, 1, 1]).
size mismatch for model.22.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([18]).
size mismatch for model.26.weight: copying a param with shape torch.Size([255, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([18, 512, 1, 1]).
size mismatch for model.26.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([18]).
I have had this working previously, and I can not find what I have changed to create this issue. I don't know if changes to the code or weights have changed as I run everything on Colab and just pull down the code/weights as required. Despite reverting to a setting that (I think) worked previously I have played, toyed and adjusted settings for hours and I cant seem to get this rolling again. a bit of research tells me there can be a conflict between classes, but I don't know. Any help would be appreciated. TIA
Thanks heaps
The text was updated successfully, but these errors were encountered: