💡Idea: Mosaic cropping using segmentation labels #2151

glenn-jocher · 2021-02-06T21:22:26Z

I had an idea today! COCO supplies segmentation annotations for every instance, but we don't use them. I realized it might be useful though to have access to these annotations in the dataloader because they can help re-label cropped objects more accurately. The current mosaic loader will translate/augment images and adjust their labels accordingly, but depending on the shape of the object this may produce suboptimal results (see below).

Re-labelling the augmented images based on their cropped segmentation labels rather than their cropped box labels would likely produce more desirable bounding boxes. The benefit is not possible to quantize without actually implementing the idea though, which seems to be a very complicated task, and unfortunately the benefit would only be available to datasets with accompanying segmentation labels.

Has anyone tried this, or does anyone have a segmentation-capable version of the YOLOv5 dataloader available?

WongKinYiu · 2021-02-07T00:19:25Z

Hello,

We have done this by using pycocotools. All you need is:

from pycocotools.coco import COCO
from pycocotools import mask as maskUtils

Following, you need get annotations of each image.

coco_info = COCO("your_train_or_val_json")
for img_id in coco_info.getImgIds():
    # you could choose `iscrowd=True` or is `iscrowd=False` or both for each image
    anns_ids = coco_info.getAnnIds(img_id, iscrowd=False)
    # annotation info include `box`, `segmentation`, `area`... will be here
    anns = coco_info.loadAnns(anns_ids)
    # image info include `file_name`, `width`, `height`... will be here
    img = coco_info.loadImgs(int(img_id))[0]

Now, you could process annotation of each image

img_file_name = img["file_name"]
img_height = img["height"]
img_width = img["width"]
for ann in anns:
    # you could use ann["area"] to ignore small or large objects here
    # you may need `coco91_to_80` to convert category id for yolov5 style annotation
    ann_class = ann["category_id"]
    ann_bbox = ann["bbox"]
    ann_segm = ann['segmentation']
    # you could normalize x,y coordinate to ratio by using `img_height` and `img_width` here for yolov5 style annotation
    # also you may want to save your annotation file corresponding to img_file_name here

There are three cases for segmentation info:

an object is separated into several parts
an object has only one part
segmentation already be compressed

# case 1
if type(ann_segm) is list:
    rles = maskUtils.frPyObjects(ann_segm , img_height , img_width)
    rle = maskUtils.merge(rles)
# case 2
elif type(ann_segm['counts']) is list:
    rle = maskUtils.frPyObjects(ann_segm , img_height , img_width)
# case 3
else:
    rle = ann_segm['segmentation']

# again, use pycocotools to get the binary mask
ann_mask = maskUtils.decode(rle)

Now you get the annotation mask, the annotation mask is a binary mask with same resolution as image.
You could do:

perspective transformation to annotation mask (include rotate, shear...)
get bonding box on the fly (by calculating min max x y)
do instance segmentation and semantic segmentation tasks
copy and paste augmentation (https://arxiv.org/abs/2012.07177)
...

I am really sure this case is the main reason which makes mosaic9 get worse results than mosaic4.
By handling this problem, all of YOLOv4-P5, YOLOv4-P6, and YOLOv4-P7 improve about 0.5% AP on COCO.

You could also do image collage augmentation which automatic generate grid layout of sampled images without (or with least) cropping. For reference: https://github.com/adrienverge/PhotoCollage

glenn-jocher · 2021-02-07T04:10:00Z

@WongKinYiu good suggestions! Copy-paste augmentation looks like a good idea too. I've tried this in the paste with bounding boxes but the results were poor, I'm sure segmentation will help this substantially.

WongKinYiu · 2021-02-07T04:26:42Z

Analysis of mosaic augmentation:
Mosaic4 - about 2 of 4 images have crop issue (1/2)

Mosaic9 - about 6 of 9 images have crop issue (2/3)

glenn-jocher · 2021-02-10T20:57:40Z

@WongKinYiu yes this is a good point, mosaic9 will have more crops on average than mosaic4. Ok I'm working an implementation that can leverage the segmentation masks to handle these crops better, I'll test this on the 4 models at 640 to see if it helps.

I've tried to make this extensible to other datasets so anyone with segmentation data can also benefit.

github-actions · 2021-03-15T00:42:40Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Edwardmark · 2021-03-25T07:21:33Z

@glenn-jocher I just check the cropped_part with the original gt, and if the IoU between the real bbox and the cropped box is less than 0.5, than discard it:

if len(labels4):
        labels4_org = labels4.copy()

        m1 = np.sum(labels4[:, 1:] < 0, axis=1)
        m2 = np.sum(labels4[:, 1:] > 2 * s, axis=1)
        np.clip(labels4[:, 1:], 0, 2 * s, out=labels4[:, 1:])
        m_overlap = jaccard_numpy(labels4_org[:, 1:], labels4[:, 1:]) >= 0.5
        m_border = np.invert((m1 + m2).astype(bool))
        labels4 = labels4[m_border * m_overlap, :]

Is that right? Looking forward to your comment.

glenn-jocher · 2021-03-25T15:05:19Z

@Edwardmark is this custom code that you've written? We have box candidate criteria that are used to filter labels for use in training, including the percent of area lost during augmentation here:

yolov5/utils/datasets.py

Lines 924 to 926 in d4456e4

    
           # filter candidates 
        
           i = box_candidates(box1=targets[:, 1:5].T * s, box2=new.T, area_thr=0.01 if use_segments else 0.10) 
        
           targets = targets[i]

The box_candidates() function itself is here:

yolov5/utils/datasets.py

Lines 932 to 938 in d4456e4

    
           def box_candidates(box1, box2, wh_thr=2, ar_thr=20, area_thr=0.1, eps=1e-16):  # box1(4,n), box2(4,n) 
        
               # Compute candidate boxes: box1 before augment, box2 after augment, wh_thr (pixels), aspect_ratio_thr, area_ratio 
        
               w1, h1 = box1[2] - box1[0], box1[3] - box1[1] 
        
               w2, h2 = box2[2] - box2[0], box2[3] - box2[1] 
        
               ar = np.maximum(w2 / (h2 + eps), h2 / (w2 + eps))  # aspect ratio 
        
               return (w2 > wh_thr) & (h2 > wh_thr) & (w2 * h2 / (w1 * h1 + eps) > area_thr) & (ar < ar_thr)  # candidates

By default it will reject any box that has lost > 90% of its area (adjusted for scale augmentation) during the augmentation performed in random_perspective(). It's possible for boxes to lose area also in load_mosaic(), though we do not currently filter there (this has been proposed in the past by a different user by also applying box_candidates in load_mosaic()).

Edwardmark · 2021-03-26T01:36:31Z

@glenn-jocher Yes, it is my custom code. I think we should handle area loss in load_mosaic. Is 90% too much? For example, if we loss 50% of a person, then it may only contains the legs of the person, it is not a person technically.

glenn-jocher · 2021-03-26T20:10:47Z

@Edwardmark yes maybe that's a good idea! You could try running box_candidates() in the mosaic function as well as in random_perspective(), as it's possible for objects to reduce in quality during both steps. If you'd like to submit a PR based on that modification I can try running some quick trainings (VOC YOLOv5s 50 epochs, baseline scenario from the Google Colab Notebook VOC section) to quantify the difference.

glenn-jocher · 2021-05-21T13:21:36Z

Removing TODO as this has now been implemented.

GMN23362 · 2022-05-18T08:26:15Z

I had an idea today! COCO supplies segmentation annotations for every instance, but we don't use them. I realized it might be useful though to have access to these annotations in the dataloader because they can help re-label cropped objects more accurately. The current mosaic loader will translate/augment images and adjust their labels accordingly, but depending on the shape of the object this may produce suboptimal results (see below).

Re-labelling the augmented images based on their cropped segmentation labels rather than their cropped box labels would likely produce more desirable bounding boxes. The benefit is not possible to quantize without actually implementing the idea though, which seems to be a very complicated task, and unfortunately the benefit would only be available to datasets with accompanying segmentation labels.

Has anyone tried this, or does anyone have a segmentation-capable version of the YOLOv5 dataloader available?

Hi, may I ask where can we find the PPT in this issue? I haven't found that on the website of ultralytics.

glenn-jocher · 2022-05-18T13:43:07Z

@GMN23362 slides are internal and not publically available.

bit-scientist · 2022-09-05T06:06:08Z

Hi, @glenn-jocher, I would like to use copy-paste data augmentation. How should I proceed with training when I have segmentation labels as below?:

class x1, y1, x2, y2, x3, y3, ... xn, yn
class x1, y1, x2, y2, x3, y3, ... xn, yn
class x1, y1, x2, y2, x3, y3, ... xn, yn

Is python path/to/train.py --data coco128.yaml --weights yolov5s.pt --img 640 where coco128.yaml points to labels.txt file enough ? I mean, does train.py automatically infer bbox coordinations for object detection as well?

glenn-jocher · 2022-09-06T11:55:18Z

@bit-scientist yes.

bit-scientist · 2022-09-20T06:50:51Z

@glenn-jocher I'm finding it difficult to make my masks to come to x1, y1, x2, y2, x3, y3, ... xn, yn format. What is meant by segment in?

yolov5/utils/general.py

Lines 287 to 293 in ad05e37

    
           def segment2box(segment, width=640, height=640): 
        
               # Convert 1 segment label to 1 box label, applying inside-image constraint, i.e. (xy1, xy2, ...) to (xyxy) 
        
               x, y = segment.T  # segment xy 
        
               inside = (x >= 0) & (y >= 0) & (x <= width) & (y <= height) 
        
               x, y, = x[inside], y[inside] 
        
               return np.array([x.min(), y.min(), x.max(), y.max()]) if any(x) else np.zeros((1, 4))  # cls, xyxy

I mean what format is the segment expected to be?

The coco format for segmentation creates one mask image per image, right? How do I then convert it to generate x1, y1, x2, y2, x3, y3, ... xn, yn?

EDIT: I'm sorry, it does have segmentation points in the json file in the form:
"segmentation": [[1454.1, 647.93, 1529.96, 557.96, 1582.88, 499.75, 1600.52, 392.14, 1669.32, 349.8 ]].
So, in this case it should be:

class 1454.1, 647.93, 1529.96, 557.96, 1582.88, 499.75, 1600.52, 392.14, 1669.32, 349.8.

Does it make sense now?

glenn-jocher · 2022-09-20T10:18:33Z

@bit-scientist for a segmentation dataset just run segment/train.py usage examples:

python segment/train.py --data coco128-seg.yaml

bit-scientist · 2022-09-20T23:52:44Z

@glenn-jocher, I don't need it for segmentation task 😃, I'd like to augment the data with --copy-paste functionality.

glenn-jocher · 2022-09-21T10:10:07Z

@bit-scientist this should work for segmentation, you just update the hyp here:

yolov5/data/hyps/hyp.scratch-low.yaml

Line 34 in 77dcf55

copy_paste: 0.0 # segment copy-paste (probability)

tino926 · 2023-07-27T09:46:42Z

@glenn-jocher

Hi, I am not confident in my understanding of your codes. Please correct me if I am wrong.

With YOLOv5, one can use segmentation data to train a detection model.
YOLOv5 automatically checks if the annotation of one object is a mask or bounding box (perhaps by the length?).
In COCO's segmentation annotation, one object may have two separate parts. However, in YOLOv5's segmentation, one segmentation can only have one continuous part.

If my understanding is correct, can you please explain how you convert COCO annotation to YOLOv5's annotation for objects with two separate parts?

glenn-jocher · 2023-07-27T11:07:27Z

@tino926 hi,

In YOLOv5, you can indeed use segmentation data to train a detection model. To handle both segmentation masks and bounding boxes, YOLOv5 automatically detects the format of the annotation based on its structure.

If an object in COCO's segmentation annotation consists of two separate parts, YOLOv5's annotation expects one continuous mask for each object. To convert COCO annotation to YOLOv5's format in such cases, you would need to merge the two separate parts into one continuous mask before using it in YOLOv5.

I hope this clarifies how the conversion is handled. If you have any further questions, please let me know.

glenn-jocher added the enhancement New feature or request label Feb 6, 2021

glenn-jocher self-assigned this Feb 6, 2021

glenn-jocher added the TODO label Feb 7, 2021

glenn-jocher linked a pull request Feb 12, 2021 that will close this issue

YOLOv5 Segmentation Dataloader Updates #2188

Merged

glenn-jocher closed this as completed in #2188 Feb 12, 2021

glenn-jocher reopened this Feb 12, 2021

github-actions bot added the Stale label Mar 15, 2021

github-actions bot closed this as completed Mar 20, 2021

glenn-jocher mentioned this issue Mar 25, 2021

Questions about segment support. #2565

Closed

This was referenced Apr 11, 2021

YOLOv5 v5.0 Release #2762

Merged

YOLOv5 v5.0 release compatibility update for YOLOv3 ultralytics/yolov3#1737

Merged

glenn-jocher removed Stale TODO labels May 21, 2021

Joker316701882 mentioned this issue Aug 22, 2021

Can you explain more about no_aug_epochs args ? Megvii-BaseDetection/YOLOX#555

Closed

glenn-jocher mentioned this issue Mar 10, 2022

Segment with 3 points #6931

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

💡Idea: Mosaic cropping using segmentation labels #2151

💡Idea: Mosaic cropping using segmentation labels #2151

glenn-jocher commented Feb 6, 2021

WongKinYiu commented Feb 7, 2021

glenn-jocher commented Feb 7, 2021

WongKinYiu commented Feb 7, 2021

glenn-jocher commented Feb 10, 2021

github-actions bot commented Mar 15, 2021

Edwardmark commented Mar 25, 2021

glenn-jocher commented Mar 25, 2021

Edwardmark commented Mar 26, 2021 •

edited

Loading

glenn-jocher commented Mar 26, 2021

glenn-jocher commented May 21, 2021

GMN23362 commented May 18, 2022

glenn-jocher commented May 18, 2022

bit-scientist commented Sep 5, 2022

glenn-jocher commented Sep 6, 2022

bit-scientist commented Sep 20, 2022 •

edited

Loading

glenn-jocher commented Sep 20, 2022

bit-scientist commented Sep 20, 2022

glenn-jocher commented Sep 21, 2022

tino926 commented Jul 27, 2023

glenn-jocher commented Jul 27, 2023

💡Idea: Mosaic cropping using segmentation labels #2151

💡Idea: Mosaic cropping using segmentation labels #2151

Comments

glenn-jocher commented Feb 6, 2021

WongKinYiu commented Feb 7, 2021

glenn-jocher commented Feb 7, 2021

WongKinYiu commented Feb 7, 2021

glenn-jocher commented Feb 10, 2021

github-actions bot commented Mar 15, 2021

Edwardmark commented Mar 25, 2021

glenn-jocher commented Mar 25, 2021

Edwardmark commented Mar 26, 2021 • edited Loading

glenn-jocher commented Mar 26, 2021

glenn-jocher commented May 21, 2021

GMN23362 commented May 18, 2022

glenn-jocher commented May 18, 2022

bit-scientist commented Sep 5, 2022

glenn-jocher commented Sep 6, 2022

bit-scientist commented Sep 20, 2022 • edited Loading

glenn-jocher commented Sep 20, 2022

bit-scientist commented Sep 20, 2022

glenn-jocher commented Sep 21, 2022

tino926 commented Jul 27, 2023

glenn-jocher commented Jul 27, 2023

Edwardmark commented Mar 26, 2021 •

edited

Loading

bit-scientist commented Sep 20, 2022 •

edited

Loading