Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--multi-scale flag for the reported results #472

Closed
nerminsamet opened this issue Aug 26, 2019 · 11 comments
Closed

--multi-scale flag for the reported results #472

nerminsamet opened this issue Aug 26, 2019 · 11 comments

Comments

@nerminsamet
Copy link

nerminsamet commented Aug 26, 2019

Hello @glenn-jocher,
thank you for the great work!

Here you report that you achieved 55.4 mAP with size 416 and configuration of YOLOv3 (yolov3.cfg). I wonder whether --multi-scale flag is set true for this 55.4 mAP.

If --multi-scale is not set, what mAP we should expect to achieve.

thanks in advance.

@glenn-jocher
Copy link
Member

@nerminsamet that's a good question! The mAPs reported at https://github.com/ultralytics/yolov3#map are using the darknet trained yolov3-spp.weights file. You can reproduce these with the code at the link.

If you train using this repo you get pytorch weights in a *.pt format. The training results are constantly improving, with the latest results coming within 1% of darknet trained results. See #310 for a more detailed discussion.

In relation to your exact question regarding the relationship of multiscale with final mAP, the correlation is debatable. Darknet training uses it by default, but I have not observed any improvement when testing at the same resolution as your training img_size. It may help, but I can't say I've observed anything consistent with that being the case. Testing at other resolutions it should obviously help, so I believe it depends on your final intended use of the trained model.

@glenn-jocher
Copy link
Member

@nerminsamet of course, if you have the resources, I would simply train both ways to compare. If you do this let us know your results!

@nerminsamet
Copy link
Author

@glenn-jocher I am training this repo with the following configuration. Right now it is in the 168th epoch. I share my lastest mAP result for coco5kval at 167th epoch below. Once my training is done I will also share the final mAP.

Namespace(accumulate=1, batch_size=64, bucket='', cache_images=False, cfg='cfg/yolov3.cfg', data='data/coco.data', epochs=273, evolve=False, img_size=416, img_weights=False, multi_scale=False, nosave=False, notest=False, rect=False, resume=True, transfer=False)

Result of 167th epoch:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.244
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.454
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.236
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.101
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.268
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.341
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.229
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.363
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.386
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.192
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.420
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.521

@glenn-jocher
Copy link
Member

@nerminsamet ah, excellent, you are committed in your training. I'll share a few tricks briefly. We typically develop based on small dataset results, like coco_16img.data, which allows rapid prototyping since training only takes a few minutes. This is useful for sanity checks and rough ideas, but results here do not correlate fully with results on the full coco dataset, so once we develop an idea on a small dataset we test it on 1 or 2 full coco epochs to tweak further, and then we test to 10% full training for a full statistical comparison. We do all this at 320 for speed, and with yolov3-spp since it bumps mAP by 1% at almost no compute expense. These are now reaching 45% mAP after 27 epochs at 320 (no multi-scale). You can see some studies we've done at #441 (comment)

The 10% training command we use is:

python3 train.py --weights weights/darknet53.conv.74 --img-size 320 --epochs 27 --batch-size 64 --accumulate 1

We've also had a weight_decay bug in place which we just fixed today, which seems to greatly impact performance: #469

@nerminsamet
Copy link
Author

hi @glenn-jocher,
thanks for the tricks. I first tested the code on coco_16img.data and everything was ok.
Now my training is over, I got 52.2 mAP which is 3.1 behind original results. I think multi-scale training could be one reason. I will train with multi-scale setup also. I will inform you about the new results.

Final results.
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.313
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.522
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.325
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.142
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.343
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.434
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.275
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.429
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.450
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.251
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.490
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.601

@glenn-jocher
Copy link
Member

@nerminsamet hmm ok! Maybe it is due to multiscale. Can you post your training results? If you have a recent version of the repo a results.png file will appear after training that plots results.txt. If no the plotting command is from utils.utils import *; plot_results(), this will plot any results*.txt files it finds.

Starting new training overwrites any existing results.txt files, so I usually rename a results.txt file after training to something new like results_416.txt so it doesn't get overwritten.

@nerminsamet
Copy link
Author

@glenn-jocher here is the results!

results

@glenn-jocher
Copy link
Member

@nerminsamet wow, ok this shows severe overtraining on the LR drop at 0.8 * 273 epochs.

  • GIoU seems to be operating correctly for all intents and purposes.
  • Confidence settles at 1.5, but validation Confidence ('Objectness' really) settles at 2.4, and shows initial val loss drops of LR drop, but then spikes higher.
  • Classification follows the same trend, with the training loss settles lower than the validation loss (1.0 vs 1.4), and then subsequent val loss gains after LR drop.

Classification and Confidence may possibly be to overtraining in relation to their positive weight hyperparameters, which are about 1.5 and 4.0.

I've tried removing these recently (setting them to 1.0 and passing their values to the respective loss gains to make obj and cls gains at 40 and 80 with pw's at 1 and 1), but initial mAPs do not respond as well as the default... so perhaps initial results are worse, but long term may show net positive, I don't know.

@glenn-jocher
Copy link
Member

@nerminsamet I think we can use the results below (320 with multiscale) as a proxy for what you might expect. It looks like multiscale reduces the overfitting a bit, but I believe the real culprit are the positive weights in the two BCELosses. Depending on how dirty you want to get your hands there are few changes you could try to the loss function, and there is also the hyperparameter evolution path you could explore: #392.

It feels like the training is a few steps away from darknet level mAP, but unfortunately we just don't have the resources to explore all the alternatives we'd like to currently.

#310 (comment) 320 multiscale
image

@glenn-jocher
Copy link
Member

@nerminsamet hi there! Did you get different results with --multi-scale?

@glenn-jocher
Copy link
Member

@nerminsamet I have a direct comparison now of the effect of using --multi-scale. Our results show a +1.6% mAP@0.5 boost: 49.3 to 50.9 on img-size 320. It helps very much prevent overfitting on the BCELoss terms after the LR drop. Before that it does not seem to show much visible effect. Maybe a smart training strategy would be to turn on multiscale right before the LR drop at epoch 218.

python3 train.py --arc default --weights 'weights/darknet53.conv.74' --img-size 320 --batch-size 64 --accumulate 1 --epochs 273 --multi-scale --prebias

results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants