Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I found Objectness in ComputeLoss which might need improvement #3605

Closed
XHBrain opened this issue Jun 14, 2021 · 12 comments · Fixed by #3610 or #3786
Closed

I found Objectness in ComputeLoss which might need improvement #3605

XHBrain opened this issue Jun 14, 2021 · 12 comments · Fixed by #3610 or #3786
Labels
bug Something isn't working

Comments

@XHBrain
Copy link
Contributor

XHBrain commented Jun 14, 2021

Hi,
sorry for disturbing you. I found one part in your code which might be improvement.
There maybe several GTs match the same anchor when calculate ComputeLoss in the scene with dense targets.

# Objectness
tobj[b, a, gj, gi] = (1.0 - self.gr) + self.gr * iou.detach().clamp(0).type(tobj.dtype) # iou ratio

The tobj is filled with the location of [b, a, gj, gi] which will be covered with the next same location.
The final tobj is the bottom right GT in build_targets.
It may be not reasonable. The improvement of matching the GT with largest IOU could be used with the following modification.

source:
tobj[b, a, gj, gi] = (1.0 - self.gr) + self.gr * iou.detach().clamp(0).type(tobj.dtype) # iou ratio
should be replaced with:

score_iou = iou.detach().clamp(0).type(tobj.dtype)
sort_id = torch.argsort(score_iou)
b, a, gj, gi, score_iou = b[sort_id], a[sort_id], gj[sort_id], gi[sort_id], score_iou[sort_id]
tobj[b, a, gj, gi] = (1.0 - self.gr) + self.gr * score_iou  # iou ratio

I just tried in the private dataset with the scene with very dense objects, the performance will be improved with above change.

@XHBrain XHBrain added the bug Something isn't working label Jun 14, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Jun 14, 2021

👋 Hello @XHBrain, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher
Copy link
Member

@XHBrain hi, thanks for the loss function idea! The best way to test your change would be to train two models on a public dataset like VOC from scratch, one with your proposed change, one with the default code:

python train.py --batch 64 --weights '' --cfg yolov5s.yaml --data voc.yaml --epochs 300 --cache --img 512 --hyp hyp.finetune.yaml --project VOC

If you could do this and report back both results that would be great!

@glenn-jocher
Copy link
Member

@XHBrain also can you submit a PR please with your changes? Then we can checkout the branch and test the change ourselves, and if it works then we can merge into master for everyone to use. Thanks!

@XHBrain
Copy link
Contributor Author

XHBrain commented Jun 14, 2021

@XHBrain also can you submit a PR please with your changes? Then we can checkout the branch and test the change ourselves, and if it works then we can merge into master for everyone to use. Thanks!

Hi @glenn-jocher, Thanks for your response. I commited the change for code review, please check it.
https://github.com/XHBrain/yolov5/tree/max_iou_objectLoss
You could choose dataset with dense objects for this change verfication.

@glenn-jocher glenn-jocher linked a pull request Jun 14, 2021 that will close this issue
@glenn-jocher
Copy link
Member

@XHBrain thanks! I've created PR #3610 from your branch, I'll check it out and experiment on it a bit.

@glenn-jocher
Copy link
Member

glenn-jocher commented Jun 18, 2021

@XHBrain I have an IoU-sort experiment ongoing at https://wandb.ai/glenn-jocher/study-mixup-coco

Was underperforming around epoch 50 but around epoch 80 seems to be slightly overperforming the baseline. This is running on EC2 spot instance though which unfortunately forces restarts and resumes too often, so I don't know if the results are reliable.

@XHBrain
Copy link
Contributor Author

XHBrain commented Jun 20, 2021

Hi @glenn-jocher, so many thanks for your verification. From your training result, I see that my change almost has no benifit for that dataset. It was curious that the change worked on my dense dataset. Anyway, I will do more experiments and analysis on the change. Thanks again.

@glenn-jocher
Copy link
Member

@XHBrain unfortunately the training has been restarted so many times due to spot restarts that the training results are no longer reliable. I'm going to need to secure some persistent hardware to rerun the tests. This is a bit of a problem for us at the moment...

@791136190
Copy link

I also found this problem, and at the same time I think we should add the de-duplication operation of box regression.

@glenn-jocher
Copy link
Member

glenn-jocher commented Jun 26, 2021

@791136190 yes, right now one anchor fits all matching targets for cls and box, but objectness losses are only applied once per anchor. This is a bit odd on the objectness loss, but seems to work well currently.

EDIT: If you have any other ideas please let us know, or even better try them out so we have some quantitative before and after numbers we can compare.

@isJunCheng
Copy link

I used the sorting function in a custom dataset and the result has a decrease instead. The anchor with multiple sample matches should have higher iou, not be covered by low iou. The theory is correct, but the experimental result is opposite and I am puzzled. Can someone help me with this please?

@glenn-jocher
Copy link
Member

@isJunCheng this could depend on the specifics of your dataset, the model architecture, and training hyperparameters. Your observations are insightful and important. It seems like an issue that warrants further investigation and perhaps a deeper understanding of the trade-offs involved. We greatly appreciate your input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants