Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlay / superimpose detected objects with another image #7177

Closed
1 task done
drawntogetha opened this issue Mar 28, 2022 · 8 comments
Closed
1 task done

Overlay / superimpose detected objects with another image #7177

drawntogetha opened this issue Mar 28, 2022 · 8 comments
Labels
question Further information is requested

Comments

@drawntogetha
Copy link

Search before asking

Question

Hello!

I have trained my own custom detector and now I would like to put a mask on top of the bounding box around the detected objects during inference. In order to do so, I have tried to modify the detect.py, namely the part about "Process predictions":

        # Process predictions

        for i, det in enumerate(pred):  # per image
        seen += 1
        if webcam:  # batch_size >= 1
            p, im0, frame = path[i], im0s[i].copy(), dataset.count
            s += f'{i}: '
        else:
            p, im0, frame = path, im0s.copy(), getattr(dataset, 'frame', 0)
        p = Path(p)  # to Path
        save_path = str(save_dir / p.name)  # im.jpg
        txt_path = str(save_dir / 'labels' / p.stem) + ('' if dataset.mode == 'image' else f'_{frame}')  # im.txt
        s += '%gx%g ' % im.shape[2:]  # print string
        gn = torch.tensor(im0.shape)[[1, 0, 1, 0]]  # normalization gain whwh
        imc = im0.copy() if save_crop else im0  # for save_crop
        #resized = cv2.resize(mask, dimtensor, cv2.INTER_LINEAR)
        #try to overwrite again
        #
        #mask = half() if model.fp16 else im.float()
        #data1 = np.asarray(mask, dtype="uint8" )
        addedimgs = cv2.add(mask, imc)
        
        annotator = Annotator(addedimgs, line_width=line_thickness, example=str(names))## im0 to addedimgs

This yields an error, as the input arguments don't match: I have no clue what datatype im0 is supposed to be, while my mask is RGB image.

I have seen that there is a way to crop and store the bounding boxes in utils.plots, so I thought that I could use or modify that function, however my skills limit me from doing that.

Please, let me know how can I accomplish this task. I have searched the web and I think that the approach is to:

  1. Find bounding box coordinates
  2. Convert & scale the mask to match the bboxes dimensions
  3. Cv2.addimages to superimpose one image onto another

However, I am getting errors as I don't quite know how to do this.

Additional

Also, I have tried to accomplish the image overlay in this part of the code (Process predictions, write the results)

                    for *xyxy, conf, cls in reversed(det):
                    if save_txt:  # Write to file BROKEN AS OF NOW BECAUSE OF NEW ADDED LINES
                    xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh
                    line = (cls, *xywh, conf) if save_conf else (cls, *xywh)  # label format
                    ###########################ADDED try 2 overlay mask over bb ROI
                    b = xywh
                    gain=1.02
                    pad=10
                    b[:, 2:] = b[:, 2:].max(1)[0].unsqueeze(1)  # attempt rectangle to square
                    b[:, 2:] = b[:, 2:] * gain + pad  # box wh * gain + pad
                    xyxy = xywh2xyxy(b).long()
                    clip_coords(xyxy, im.shape)
                    crop = im[int(xyxy[0, 1]):int(xyxy[0, 3]), int(xyxy[0, 0]):int(xyxy[0, 2]), ::( -1)]
                    
                    
                    #dsize = xywh()
                    resized = cv2.resize(mask, crop, cv2.INTER_LINEAR)
                    #try to overwrite again
                    xyxy = cv2.add(resized, crop)
                    #boxes = xyxy2xywh(xyxy)
@drawntogetha drawntogetha added the question Further information is requested label Mar 28, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Mar 28, 2022

👋 Hello @drawntogetha, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher
Copy link
Member

@drawntogetha 👋 Hello! Thanks for asking about cropping results with YOLOv5 🚀. Cropping bounding box detections can be useful for training classification models on box contents for example. This feature was added in PR #2827. You can crop detections using either detect.py or YOLOv5 PyTorch Hub:

detect.py

Crops will be saved under runs/detect/exp/crops, with a directory for each class detected.

python detect.py --save-crop

Original

Crop

YOLOv5 PyTorch Hub

Crops will be saved under runs/detect/exp/crops if save=True, and also returned as a dictionary with crops as numpy arrays.

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')  # or yolov5m, yolov5l, yolov5x, custom

# Images
img = 'https://ultralytics.com/images/zidane.jpg'  # or file, Path, PIL, OpenCV, numpy, list

# Inference
results = model(img)

# Results
crops = results.crop(save=True)  # or .show(), .save(), .print(), .pandas(), etc.

Good luck 🍀 and let us know if you have any other questions!

@drawntogetha
Copy link
Author

@glenn-jocher
Thank you for the reply! I'll try this and re-open the issue if needed.

@drawntogetha
Copy link
Author

Hi again!

@glenn-jocher I have tried to utilise save_one_box function from plots .py so that I can replace the detected objects with my own image. Problem is that I need to crop & replace the bounding box detections in a video/webcam feed. I've looked into the inference with --save-crop, but I am not realising how should I utilise it for realtime replacement of the object.

I believe that I need to define xyxy corners in my image and modify:

crop = im[int(xyxy[0, 1]):int(xyxy[0, 3]), int(xyxy[0, 0]):int(xyxy[0, 2]), ::(1 if BGR else -1)]

so that I am overwriting the detections with my own image.

@glenn-jocher
Copy link
Member

glenn-jocher commented Mar 30, 2022

@drawntogetha 👋 Hello! Thanks for asking about handling inference results. YOLOv5 🚀 PyTorch Hub models allow for simple model loading and inference in a pure python environment without using detect.py.

Simple Inference Example

This example loads a pretrained YOLOv5s model from PyTorch Hub as model and passes an image for inference. 'yolov5s' is the YOLOv5 'small' model. For details on all available models please see the README. Custom models can also be loaded, including custom trained PyTorch models and their exported variants, i.e. ONNX, TensorRT, TensorFlow, OpenVINO YOLOv5 models.

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')  # or yolov5m, yolov5l, yolov5x, etc.
# model = torch.hub.load('ultralytics/yolov5', 'custom', 'path/to/best.pt')  # custom trained model

# Images
im = 'https://ultralytics.com/images/zidane.jpg'  # or file, Path, URL, PIL, OpenCV, numpy, list

# Inference
results = model(im)

# Results
results.print()  # or .show(), .save(), .crop(), .pandas(), etc.

results.xyxy[0]  # im predictions (tensor)
results.pandas().xyxy[0]  # im predictions (pandas)
#      xmin    ymin    xmax   ymax  confidence  class    name
# 0  749.50   43.50  1148.0  704.5    0.874023      0  person
# 2  114.75  195.75  1095.0  708.0    0.624512      0  person
# 3  986.00  304.00  1028.0  420.0    0.286865     27     tie

See YOLOv5 PyTorch Hub Tutorial for details.

Good luck 🍀 and let us know if you have any other questions!

@drawntogetha
Copy link
Author

Still no success :(

I've managed to overwrite

--save-crop

So that it saves my own image instead of the content of the bboxes. So now if I run detect.py --save-crop it does save my own image instead of the cropped detected image.

I've tried to change the save_img, with the code from save_one_box, where numpimg would be my own image. I am aware that this is a disgusting code re-usage but I am desperate to make it work:

                    if save_img or save_crop or view_img:  # Add bbox to image
                    c = int(cls)  # integer class
                    label = None if hide_labels else (names[c] if hide_conf else f'{xyxy[0]},{xyxy[1]}')#{names[c]} {conf:.2f}
                    
                    xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh
                    #b = xywh
                    
                    #b[:, 2:] = b[:, 2:].max(1)[0].unsqueeze(1)  # attempt rectangle to square
                    #b[:, 2:] = b[:, 2:] * gn + 10  # box wh * gain + pad
                    #xyxy = xywh2xyxy(b).long()
                    xyxy = torch.tensor(xyxy).view(-1, 4)
                    b = xyxy2xywh(xyxy)  # boxes
                    #square == False;
                    #if square:
                        #b[:, 2:] = b[:, 2:].max(1)[0].unsqueeze(1)  # attempt rectangle to square
                    gain=1.02
                    pad=10
                    b[:, 2:] = b[:, 2:] * gain + pad  # box wh * gain + pad
                    clip_coords(xyxy, im.shape)
                    crop = im[int(xyxy[0, 1]):int(xyxy[0, 3]), int(xyxy[0, 0]):int(xyxy[0, 2])]                       
                    crop = numpimg
                    
                    annotator.box_label(crop, label, color=colors(c, True))#xyxy to X

This yields me an error:

File "C:\Users\Drawntogetha\Desktop\yolorepository\yolov5\detect.py", line 206, in run crop = im[int(xyxy[0]):int(xyxy[1]), int(xyxy[2]):int(xyxy[3])] ValueError: only one element tensors can be converted to Python scalars

I feel like I'm walking blind and the answer is somewhere right in front of me, but I am not seeing it :)

Just to make myself clear: my aim is to overlay a detected object with an image in the video feed.

@drawntogetha
Copy link
Author

Or actually, I'm thinking I have to deal with im0 = annotator.result() as the working data.

I've found a stackoverflow question which seem to deal with a similar problem: https://stackoverflow.com/questions/57262520/replacing-a-solid-green-region-with-another-image-with-opencv

In the code there are hardcoded coordinates of the target bounding box, and when I try to run it with those it places my image at that custom location (static image in the video feed). I can define the corners of my image and make a list of these points.
After, I want to do the [cv.getPerspectiveTransform] (https://docs.opencv.org/4.x/da/d54/group__imgproc__transform.html#gae66ba39ba2e47dd0750555c7e986ab85)
to map my image onto the corners of the bounding box. Problem is that how do I get those corners? I know that they are stored in the xyxy, but how do I retrieve each one of them?

Please, take a look here if my logic is sane:
`im0 = annotator.result()

        if view_img:
            pts_src = np.float32([[0, 0], [325, 0], [325, 472], [0, 472]])
            #pts_dst = im0.np([x1:]),[y1:],[x2:],[y2:])`

Where pts_dst would be the corners of the yolo bounding boxes.

@drawntogetha
Copy link
Author

I did a workaround using this guys code https://www.learnpythonwithrune.org/opencv-python-webcam-how-to-track-and-replace-object/. The result is rather ugly, but it accomplishes what I have intended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants