Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poor box width regression on text detection #518

Open
travisCxy opened this issue Jun 27, 2024 · 3 comments
Open

poor box width regression on text detection #518

travisCxy opened this issue Jun 27, 2024 · 3 comments

Comments

@travisCxy
Copy link

hello, thank you for your code. I am training a yolov9-model for document image layout detection。I got a good map on my validate set。But the question is text detection some time got a bad width regression。can u help me?
0cdd69db56714fbc89b8845eb3f6e11f_sm_yolov9

@ankandrew
Copy link

ankandrew commented Jun 29, 2024

Some questions:

  1. Did you try diff input resolution than 640, i.e. lower 416?
  2. How big (# samples) is your training data?
  3. Which model are you using, is it pre-trained with COCO (weights provided by repo)?

Also, double check that mixup augmentation is not ruining your training. Try seeing if augmentation is what you expect. Below is a script I use to visualize the augmentation:

https://github.com/ankandrew/yolov9/blob/8fecc650bebf7348a6372f43b668b344de070129/visualize_augmentation.py

@travisCxy
Copy link
Author

@ankandrew hello

  1. i am using a bigger size 1024 for training my model, because the original document image is all high resolution
  2. I have 44000 training data, i think it is enough to train the model
  3. I am using yolov9-e and load the pretrained weights with coco
    I check my augmentation, you are right, i didnt close the mixup augmentation. I check the augmentation using your scipts, than i close mosaic and copy_paste, i will train one more time with current setting.
    by the way, i reading the code about compute loss. the bbox loss mainly focous on iou, I have doubt with the iou loss is not helpful for accurate bbox regression. So i change the loss to l1 loss, but I got a worse result, do you have any idea?

@ankandrew
Copy link

ankandrew commented Jul 16, 2024

Hi @travisCxy! Sorry for late response. I think your analysis on point (3) seems accurate. Seems existing MDPIoU loss could be used instead of currently one used CIoU. The MDPIoU includes a penalty term based on the distance between the corners of the bounding boxes, which should make it more suitable for text detection where corner alignment is critical to avoid cropping letters (like in your examples). Let me know if this helps in your dataset.

yolov9/utils/metrics.py

Lines 292 to 296 in 5b1ea9a

elif MDPIoU:
d1 = (b2_x1 - b1_x1) ** 2 + (b2_y1 - b1_y1) ** 2
d2 = (b2_x2 - b1_x2) ** 2 + (b2_y2 - b1_y2) ** 2
mpdiou_hw_pow = feat_h ** 2 + feat_w ** 2
return iou - d1 / mpdiou_hw_pow - d2 / mpdiou_hw_pow # MPDIoU

You can use my branch to select the bounding box loss function or cherry pick my commit to easily test other loss functions than default one ankandrew@9527269.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants