Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to CROP detected rectangle to OCR later? #834

Closed
TeddyPerkins opened this issue Aug 24, 2020 · 13 comments
Closed

How to CROP detected rectangle to OCR later? #834

TeddyPerkins opened this issue Aug 24, 2020 · 13 comments
Labels
question Further information is requested

Comments

@TeddyPerkins
Copy link

❔Question

I am trying to process a complex (form + table) image. I would like it to crop only few content so then they can be OCR-ed.
Is there any arguments I am missing or code itself.

I saw a solution on Stackoverflow but it was old YOLO.

Any help would be appreciated, Thanks.

Additional context

YOLO -> Crop (How to ?) -> OCR

@TeddyPerkins TeddyPerkins added the question Further information is requested label Aug 24, 2020
@github-actions
Copy link
Contributor

github-actions bot commented Aug 24, 2020

Hello @TeddyPerkins, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook Open In Colab, Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

  • Cloud-based AI systems operating on hundreds of HD video streams in realtime.
  • Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
  • Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

@NanoCode012
Copy link
Contributor

NanoCode012 commented Aug 24, 2020

If you pass --save-txt, it would save the bounding box info to file.

yolov5/detect.py

Lines 108 to 109 in 5f07782

with open(txt_path + '.txt', 'a') as f:
f.write(('%g ' * 5 + '\n') % (cls, *xywh)) # label format

You can read the file to crop to those positions if I understand you correctly.

@TeddyPerkins
Copy link
Author

Thank you Mr.Chanvichet, I will try this and let you know.

@glenn-jocher
Copy link
Member

glenn-jocher commented Aug 25, 2020

@TeddyPerkins you can take img in detect.py and slice and dice it however you want.

There is object cropping and saving code in the training dataloader which you can use for reference:

yolov5/utils/datasets.py

Lines 404 to 421 in a8751e5

# Extract object detection boxes for a second stage classifier
if extract_bounding_boxes:
p = Path(self.img_files[i])
img = cv2.imread(str(p))
h, w = img.shape[:2]
for j, x in enumerate(l):
f = '%s%sclassifier%s%g_%g_%s' % (p.parent.parent, os.sep, os.sep, x[0], j, p.name)
if not os.path.exists(Path(f).parent):
os.makedirs(Path(f).parent) # make new output folder
b = x[1:] * [w, h, w, h] # box
b[2:] = b[2:].max() # rectangle to square
b[2:] = b[2:] * 1.3 + 30 # pad
b = xywh2xyxy(b.reshape(-1, 4)).ravel().astype(np.int)
b[[0, 2]] = np.clip(b[[0, 2]], 0, w) # clip boxes outside of image
b[[1, 3]] = np.clip(b[[1, 3]], 0, h)
assert cv2.imwrite(f, img[b[1]:b[3], b[0]:b[2]]), 'Failure extracting classifier boxes'

@TeddyPerkins
Copy link
Author

oh thats so cool, thank you for pointing this out too.

@glenn-jocher
Copy link
Member

@TeddyPerkins sure, no problem. The dataloader extraction use-case is obviously to take an existing dataset and extract it's contents into a classification-style dataset, with class-labelled folders etc all automatically.

If you just want to do this for a few images with detect.py however, you might want to simply look at the second-stage classifier code. This code is latent (not used currently), but it's use cases is to pass YOLOv5 detections through a second stage classifier to reduce FPs. As part of this it naturally crops detected boxes to feed to the classifier:

yolov5/detect.py

Lines 80 to 83 in 8666bc5

# Apply Classifier
if classify:
pred = apply_classifier(pred, modelc, img, im0s)

@TeddyPerkins
Copy link
Author

Whoa, this makes it so much easier pipelining other tasks.

Thank you so much Mr. Glenn!

@TeddyPerkins
Copy link
Author

TeddyPerkins commented Aug 25, 2020

@glenn-jocher, but can I crop trapezoid shapes(Keystone effect) to rectangles though?

@glenn-jocher
Copy link
Member

@TeddyPerkins last time I checked every shape can be enclosed by a rectangle.

@shivprasad94
Copy link

#2608 (comment)

U can refer to my reply in the above link.

@glenn-jocher
Copy link
Member

@TeddyPerkins Prediction box cropping is now available in YOLOv5 via PR #2827! PyTorch Hub models can use results.crop() or detect.py can be called with the --save-crop argument. Example usage:

python detect.py --save-crop

Screenshot 2021-04-20 at 23 50 51

@palash04
Copy link

palash04 commented Jan 9, 2023

For those who want to crop the image from the generated labels txt, I have compiled the code from detect.py to crop, given the labels.

Following is the code snippet. Call crop_image with the associated arguments.

def xywhn2xyxy(x, w=640, h=640, padw=0, padh=0):
    # Convert nx4 boxes from [x, y, w, h] normalized to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
    y = np.copy(x)
    y[..., 0] = w * (x[..., 0] - x[..., 2] / 2) + padw  # top left x
    y[..., 1] = h * (x[..., 1] - x[..., 3] / 2) + padh  # top left y
    y[..., 2] = w * (x[..., 0] + x[..., 2] / 2) + padw  # bottom right x
    y[..., 3] = h * (x[..., 1] + x[..., 3] / 2) + padh  # bottom right y
    return y

def xyxy2xywh(x):
    # Convert nx4 boxes from [x1, y1, x2, y2] to [x, y, w, h] where xy1=top-left, xy2=bottom-right
    y = np.copy(x)
    y[..., 0] = (x[..., 0] + x[..., 2]) / 2  # x center
    y[..., 1] = (x[..., 1] + x[..., 3]) / 2  # y center
    y[..., 2] = x[..., 2] - x[..., 0]  # width
    y[..., 3] = x[..., 3] - x[..., 1]  # height
    return y

def xywh2xyxy(x):
    # Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
    y = np.copy(x)
    y[..., 0] = x[..., 0] - x[..., 2] / 2  # top left x
    y[..., 1] = x[..., 1] - x[..., 3] / 2  # top left y
    y[..., 2] = x[..., 0] + x[..., 2] / 2  # bottom right x
    y[..., 3] = x[..., 1] + x[..., 3] / 2  # bottom right y
    return y

def clip_boxes(boxes, shape):
    boxes[..., [0, 2]] = boxes[..., [0, 2]].clip(0, shape[1])  # x1, x2
    boxes[..., [1, 3]] = boxes[..., [1, 3]].clip(0, shape[0])  # y1, y2
    return boxes

def crop_image(img, dh, dw, x, y, w, h, gain=1.02, pad=10):
    # img is a numpy type array
    # dh, dw, _ = img.shape
    # class_id, x, y, w, h, conf_score = line
    # where, line is read from labels txt file
    xyxy = xywhn2xyxy(np.array([x,y,w,h]).reshape(1,4), w=dw, h=dh)
    b = xyxy2xywh(xyxy)
    b[:, 2:] = b[:, 2:] * gain + pad
    xyxy = xywh2xyxy(b).astype('int64')
    xyxy = clip_boxes(xyxy, [dh,dw])
    crop = img[int(xyxy[0, 1]):int(xyxy[0, 3]), int(xyxy[0, 0]):int(xyxy[0, 2]), ::(-1)] # keeping -1 as opposed to original code
    
    # save cropped image
    Image.fromarray(crop[..., ::-1]).save('cropped_img.png', quality=95, subsampling=0)  # save RGB
    return crop

@glenn-jocher
Copy link
Member

@palash04 Thanks for sharing the code snippet! This could be a useful reference for anyone looking to crop images from the generated labels. Hope this helps the community!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants