How to CROP detected rectangle to OCR later? #834

TeddyPerkins · 2020-08-24T11:30:58Z

❔Question

I am trying to process a complex (form + table) image. I would like it to crop only few content so then they can be OCR-ed.
Is there any arguments I am missing or code itself.

I saw a solution on Stackoverflow but it was old YOLO.

Any help would be appreciated, Thanks.

Additional context

YOLO -> Crop (How to ?) -> OCR

github-actions · 2020-08-24T11:31:59Z

Hello @TeddyPerkins, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

Cloud-based AI systems operating on hundreds of HD video streams in realtime.
Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

NanoCode012 · 2020-08-24T12:42:25Z

If you pass --save-txt, it would save the bounding box info to file.

yolov5/detect.py

Lines 108 to 109 in 5f07782

    
           with open(txt_path + '.txt', 'a') as f: 
        
               f.write(('%g ' * 5 + '\n') % (cls, *xywh))  # label format

You can read the file to crop to those positions if I understand you correctly.

TeddyPerkins · 2020-08-24T14:43:45Z

Thank you Mr.Chanvichet, I will try this and let you know.

glenn-jocher · 2020-08-25T02:51:23Z

@TeddyPerkins you can take img in detect.py and slice and dice it however you want.

There is object cropping and saving code in the training dataloader which you can use for reference:

yolov5/utils/datasets.py

Lines 404 to 421 in a8751e5

    
           # Extract object detection boxes for a second stage classifier 
        
           if extract_bounding_boxes: 
        
               p = Path(self.img_files[i]) 
        
               img = cv2.imread(str(p)) 
        
               h, w = img.shape[:2] 
        
               for j, x in enumerate(l): 
        
                   f = '%s%sclassifier%s%g_%g_%s' % (p.parent.parent, os.sep, os.sep, x[0], j, p.name) 
        
                   if not os.path.exists(Path(f).parent): 
        
                       os.makedirs(Path(f).parent)  # make new output folder 
        
                   b = x[1:] * [w, h, w, h]  # box 
        
                   b[2:] = b[2:].max()  # rectangle to square 
        
                   b[2:] = b[2:] * 1.3 + 30  # pad 
        
                   b = xywh2xyxy(b.reshape(-1, 4)).ravel().astype(np.int) 
        
                   b[[0, 2]] = np.clip(b[[0, 2]], 0, w)  # clip boxes outside of image 
        
                   b[[1, 3]] = np.clip(b[[1, 3]], 0, h) 
        
                   assert cv2.imwrite(f, img[b[1]:b[3], b[0]:b[2]]), 'Failure extracting classifier boxes'

TeddyPerkins · 2020-08-25T03:14:46Z

oh thats so cool, thank you for pointing this out too.

glenn-jocher · 2020-08-25T03:50:15Z

@TeddyPerkins sure, no problem. The dataloader extraction use-case is obviously to take an existing dataset and extract it's contents into a classification-style dataset, with class-labelled folders etc all automatically.

If you just want to do this for a few images with detect.py however, you might want to simply look at the second-stage classifier code. This code is latent (not used currently), but it's use cases is to pass YOLOv5 detections through a second stage classifier to reduce FPs. As part of this it naturally crops detected boxes to feed to the classifier:

yolov5/detect.py

Lines 80 to 83 in 8666bc5

    
           # Apply Classifier 
        
           if classify: 
        
               pred = apply_classifier(pred, modelc, img, im0s)

TeddyPerkins · 2020-08-25T04:59:09Z

Whoa, this makes it so much easier pipelining other tasks.

Thank you so much Mr. Glenn!

TeddyPerkins · 2020-08-25T05:54:29Z

@glenn-jocher, but can I crop trapezoid shapes(Keystone effect) to rectangles though?

glenn-jocher · 2020-08-25T06:57:22Z

@TeddyPerkins last time I checked every shape can be enclosed by a rectangle.

shivprasad94 · 2021-03-29T08:55:40Z

#2608 (comment)

U can refer to my reply in the above link.

glenn-jocher · 2021-04-21T10:51:59Z

@TeddyPerkins Prediction box cropping is now available in YOLOv5 via PR #2827! PyTorch Hub models can use results.crop() or detect.py can be called with the --save-crop argument. Example usage:

python detect.py --save-crop

palash04 · 2023-01-09T18:19:55Z

For those who want to crop the image from the generated labels txt, I have compiled the code from detect.py to crop, given the labels.

Following is the code snippet. Call crop_image with the associated arguments.

def xywhn2xyxy(x, w=640, h=640, padw=0, padh=0):
    # Convert nx4 boxes from [x, y, w, h] normalized to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
    y = np.copy(x)
    y[..., 0] = w * (x[..., 0] - x[..., 2] / 2) + padw  # top left x
    y[..., 1] = h * (x[..., 1] - x[..., 3] / 2) + padh  # top left y
    y[..., 2] = w * (x[..., 0] + x[..., 2] / 2) + padw  # bottom right x
    y[..., 3] = h * (x[..., 1] + x[..., 3] / 2) + padh  # bottom right y
    return y

def xyxy2xywh(x):
    # Convert nx4 boxes from [x1, y1, x2, y2] to [x, y, w, h] where xy1=top-left, xy2=bottom-right
    y = np.copy(x)
    y[..., 0] = (x[..., 0] + x[..., 2]) / 2  # x center
    y[..., 1] = (x[..., 1] + x[..., 3]) / 2  # y center
    y[..., 2] = x[..., 2] - x[..., 0]  # width
    y[..., 3] = x[..., 3] - x[..., 1]  # height
    return y

def xywh2xyxy(x):
    # Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
    y = np.copy(x)
    y[..., 0] = x[..., 0] - x[..., 2] / 2  # top left x
    y[..., 1] = x[..., 1] - x[..., 3] / 2  # top left y
    y[..., 2] = x[..., 0] + x[..., 2] / 2  # bottom right x
    y[..., 3] = x[..., 1] + x[..., 3] / 2  # bottom right y
    return y

def clip_boxes(boxes, shape):
    boxes[..., [0, 2]] = boxes[..., [0, 2]].clip(0, shape[1])  # x1, x2
    boxes[..., [1, 3]] = boxes[..., [1, 3]].clip(0, shape[0])  # y1, y2
    return boxes

def crop_image(img, dh, dw, x, y, w, h, gain=1.02, pad=10):
    # img is a numpy type array
    # dh, dw, _ = img.shape
    # class_id, x, y, w, h, conf_score = line
    # where, line is read from labels txt file
    xyxy = xywhn2xyxy(np.array([x,y,w,h]).reshape(1,4), w=dw, h=dh)
    b = xyxy2xywh(xyxy)
    b[:, 2:] = b[:, 2:] * gain + pad
    xyxy = xywh2xyxy(b).astype('int64')
    xyxy = clip_boxes(xyxy, [dh,dw])
    crop = img[int(xyxy[0, 1]):int(xyxy[0, 3]), int(xyxy[0, 0]):int(xyxy[0, 2]), ::(-1)] # keeping -1 as opposed to original code
    
    # save cropped image
    Image.fromarray(crop[..., ::-1]).save('cropped_img.png', quality=95, subsampling=0)  # save RGB
    return crop

glenn-jocher · 2023-11-15T09:43:51Z

@palash04 Thanks for sharing the code snippet! This could be a useful reference for anyone looking to crop images from the generated labels. Hope this helps the community!

TeddyPerkins added the question Further information is requested label Aug 24, 2020

glenn-jocher closed this as completed Aug 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to CROP detected rectangle to OCR later? #834

How to CROP detected rectangle to OCR later? #834

TeddyPerkins commented Aug 24, 2020

github-actions bot commented Aug 24, 2020 •

edited by glenn-jocher

Loading

NanoCode012 commented Aug 24, 2020 •

edited

Loading

TeddyPerkins commented Aug 24, 2020

glenn-jocher commented Aug 25, 2020 •

edited

Loading

TeddyPerkins commented Aug 25, 2020

glenn-jocher commented Aug 25, 2020

TeddyPerkins commented Aug 25, 2020

TeddyPerkins commented Aug 25, 2020 •

edited

Loading

glenn-jocher commented Aug 25, 2020

shivprasad94 commented Mar 29, 2021

glenn-jocher commented Apr 21, 2021

palash04 commented Jan 9, 2023

glenn-jocher commented Nov 15, 2023

How to CROP detected rectangle to OCR later? #834

How to CROP detected rectangle to OCR later? #834

Comments

TeddyPerkins commented Aug 24, 2020

❔Question

Additional context

github-actions bot commented Aug 24, 2020 • edited by glenn-jocher Loading

NanoCode012 commented Aug 24, 2020 • edited Loading

TeddyPerkins commented Aug 24, 2020

glenn-jocher commented Aug 25, 2020 • edited Loading

TeddyPerkins commented Aug 25, 2020

glenn-jocher commented Aug 25, 2020

TeddyPerkins commented Aug 25, 2020

TeddyPerkins commented Aug 25, 2020 • edited Loading

glenn-jocher commented Aug 25, 2020

shivprasad94 commented Mar 29, 2021

glenn-jocher commented Apr 21, 2021

palash04 commented Jan 9, 2023

glenn-jocher commented Nov 15, 2023

github-actions bot commented Aug 24, 2020 •

edited by glenn-jocher

Loading

NanoCode012 commented Aug 24, 2020 •

edited

Loading

glenn-jocher commented Aug 25, 2020 •

edited

Loading

TeddyPerkins commented Aug 25, 2020 •

edited

Loading