poor box width regression on text detection #518

travisCxy · 2024-06-27T05:55:28Z

hello, thank you for your code. I am training a yolov9-model for document image layout detection。I got a good map on my validate set。But the question is text detection some time got a bad width regression。can u help me?

ankandrew · 2024-06-29T13:58:33Z

Some questions:

Did you try diff input resolution than 640, i.e. lower 416?
How big (# samples) is your training data?
Which model are you using, is it pre-trained with COCO (weights provided by repo)?

Also, double check that mixup augmentation is not ruining your training. Try seeing if augmentation is what you expect. Below is a script I use to visualize the augmentation:

https://github.com/ankandrew/yolov9/blob/8fecc650bebf7348a6372f43b668b344de070129/visualize_augmentation.py

travisCxy · 2024-07-02T02:25:10Z

@ankandrew hello

i am using a bigger size 1024 for training my model, because the original document image is all high resolution
I have 44000 training data, i think it is enough to train the model
I am using yolov9-e and load the pretrained weights with coco
I check my augmentation, you are right, i didnt close the mixup augmentation. I check the augmentation using your scipts, than i close mosaic and copy_paste, i will train one more time with current setting.
by the way, i reading the code about compute loss. the bbox loss mainly focous on iou, I have doubt with the iou loss is not helpful for accurate bbox regression. So i change the loss to l1 loss, but I got a worse result, do you have any idea?

ankandrew · 2024-07-16T02:56:20Z

Hi @travisCxy! Sorry for late response. I think your analysis on point (3) seems accurate. Seems existing MDPIoU loss could be used instead of currently one used CIoU. The MDPIoU includes a penalty term based on the distance between the corners of the bounding boxes, which should make it more suitable for text detection where corner alignment is critical to avoid cropping letters (like in your examples). Let me know if this helps in your dataset.

yolov9/utils/metrics.py

Lines 292 to 296 in 5b1ea9a

    
           elif MDPIoU: 
        
               d1 = (b2_x1 - b1_x1) ** 2 + (b2_y1 - b1_y1) ** 2 
        
               d2 = (b2_x2 - b1_x2) ** 2 + (b2_y2 - b1_y2) ** 2 
        
               mpdiou_hw_pow = feat_h ** 2 + feat_w ** 2 
        
               return iou - d1 / mpdiou_hw_pow - d2 / mpdiou_hw_pow  # MPDIoU

You can use my branch to select the bounding box loss function or cherry pick my commit to easily test other loss functions than default one ankandrew@9527269.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

poor box width regression on text detection #518

poor box width regression on text detection #518

travisCxy commented Jun 27, 2024

ankandrew commented Jun 29, 2024 •

edited

Loading

travisCxy commented Jul 2, 2024

ankandrew commented Jul 16, 2024 •

edited

Loading

poor box width regression on text detection #518

poor box width regression on text detection #518

Comments

travisCxy commented Jun 27, 2024

ankandrew commented Jun 29, 2024 • edited Loading

travisCxy commented Jul 2, 2024

ankandrew commented Jul 16, 2024 • edited Loading

ankandrew commented Jun 29, 2024 •

edited

Loading

ankandrew commented Jul 16, 2024 •

edited

Loading