Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to "finish" raw inference output (with respect to anchors), to get the bounding boxes #6136

Closed
hamedmh opened this issue Dec 30, 2021 · 9 comments · Fixed by #6195
Closed

Comments

@hamedmh
Copy link

hamedmh commented Dec 30, 2021

Hi @glenn-jocher,
I use a "pure" Yolov5s model which outputs three tensors, such as: torch.Size([1, 3, 48, 80, 85]) , torch.Size([1, 3, 24, 40, 85]) , and torch.Size([1, 3, 12, 20, 85]).

I would like to convert them to bounding boxes.
I need to know which functions or equations can be used to get the bounding boxes.
Thanks!

The original issue:
@Kieran31 see PyTorch Hub tutorial for full inference examples on trained custom models.

Simple Example

This example loads a pretrained YOLOv5s model from PyTorch Hub as model and passes an image for inference. 'yolov5s' is the lightest and fastest YOLOv5 model. For details on all available models please see the README.

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')

# Image
img = 'https://ultralytics.com/images/zidane.jpg'

# Inference
results = model(img)

results.pandas().xyxy[0]
#      xmin    ymin    xmax   ymax  confidence  class    name
# 0  749.50   43.50  1148.0  704.5    0.874023      0  person
# 1  433.50  433.50   517.5  714.5    0.687988     27     tie
# 2  114.75  195.75  1095.0  708.0    0.624512      0  person
# 3  986.00  304.00  1028.0  420.0    0.286865     27     tie

YOLOv5 Tutorials

Originally posted by @glenn-jocher in #5304 (comment)

@hamedmh hamedmh changed the title @Kieran31 see **PyTorch Hub tutorial** for full inference examples on trained custom models. How to "finish" raw inference output (with respect to anchors), to get the bounding boxes Dec 30, 2021
@hamedmh
Copy link
Author

hamedmh commented Jan 1, 2022

Hi @glenn-jocher, Wish you a Happy Year!

This is an update of my experiment.
I created the model using:

model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
model = model.model

output = model(img_tensor)

output is: torch.Size([1, 25200, 85])

detections_instance = AutoShape(model).forward(img_tensor)

or

detections_instance = AutoShape(model).forward(model(img_tensor))

The result is an instance of Detections class, but the result is wrong when for example using:
img_file_name = '../data/images/zidane.jpg'

The detections_instance is: detections_instance.print()

image 1/1: Detected 1 objects of class personx1 , 1 objects of class carx1 , 18 objects of class traffic lightx18 , 1 objects of class tiex1 , 46 objects of class sports ballx46 , 1 objects of class bottlex1 , 1 objects of class cupx1 , 30 objects of class bowlx30 , 1 objects of class bananax1 , 13 objects of class applex13 , 9 objects of class broccolix9 , 24 objects of class dining tablex24 , 3 objects of class mousex3 , 58 objects of class clockx58 ,

The forward() of class AutoShape is modified to the following:

def forward(self, imgs, pred, size=640, augment=False, profile=False):
      autocast = self.amp and (p.device.type != 'cpu')  # Automatic Mixed Precision (AMP) inference
      with amp.autocast(enabled=autocast):
            y = non_max_suppression(pred, conf_thres=self.conf, iou_thres=self.iou, classes=self.classes,
                                    agnostic=self.agnostic, multi_label=self.multi_label, max_det=self.max_det)
            return Detections(imgs, y, files=["-"], names=self.names)

imgs id an image tensor (img_tensor) of size: torch.Size([1, 3, 640, 640])
Now I need to know what else should be added to forward() to fix it and get correct detections.

Best regards,
Hamed

@glenn-jocher
Copy link
Member

glenn-jocher commented Jan 2, 2022

@hamedmh 👋 Hello! Thanks for asking about handling inference results. YOLOv5 🚀 PyTorch Hub models allow for simple model loading and inference in a python environment.

Simple Inference Example

This example loads a pretrained YOLOv5s model from PyTorch Hub as model and passes an image for inference. 'yolov5s' is the lightest and fastest YOLOv5 model. For details on all available models please see the README.

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')  # or yolov5m, yolov5l, yolov5x, custom

# Images
img = 'https://ultralytics.com/images/zidane.jpg'  # or file, Path, PIL, OpenCV, numpy, list

# Inference
results = model(img)

# Results
results.print  # or .show(), .save(), .crop(), .pandas(), etc.

results.pandas().xyxy[0]
#      xmin    ymin    xmax   ymax  confidence  class    name
# 0  749.50   43.50  1148.0  704.5    0.874023      0  person
# 1  433.50  433.50   517.5  714.5    0.687988     27     tie
# 2  114.75  195.75  1095.0  708.0    0.624512      0  person
# 3  986.00  304.00  1028.0  420.0    0.286865     27     tie

See YOLOv5 PyTorch Hub Tutorial for details.

Good luck 🍀 and let us know if you have any other questions!

@hamedmh
Copy link
Author

hamedmh commented Jan 2, 2022

Hi @glenn-jocher, Thank you for your answer!
I need to feed an image tensor (torch.Size([1, 3, 640, 640])) to the model to perform inference.
I discovered that I could simply write:

model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
output = model(img_tensor)

and get a Detections instance as an output (now when I modified forward() to take a single image tensor as an input, as explained in my post above).
However, the inference result is still wrong - totally different from the result that I get when feeding model with the image file-name.
I have simplified the task by not including image size scaling.

Best regards,
Hamed

@glenn-jocher glenn-jocher linked a pull request Jan 5, 2022 that will close this issue
@glenn-jocher
Copy link
Member

@hamedmh good news 😃! Your original issue may now be fixed ✅ in PR #6195. This PR adds support for YOLOv5 CoreML inference.

!python export.py --weights yolov5s.pt --include coreml  # CoreML export
!python detect.py --weights yolov5s.mlmodel  # CoreML inference (MacOS-only)
!python val.py --weights yolov5s.mlmodle  # CoreML validation (MacOS-only)

model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5s.mlmodel')  # CoreML PyTorch Hub model

Screen Shot 2022-01-04 at 5 41 07 PM

To receive this update:

  • Gitgit pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
  • PyTorch Hub – Force-reload model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
  • Notebooks – View updated notebooks Open In Colab Open In Kaggle
  • Dockersudo docker pull ultralytics/yolov5:latest to update your image Docker Pulls

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

@hamedmh
Copy link
Author

hamedmh commented Jan 5, 2022

@glenn-jocher Thank you for the update! I'll test it.
Best regards, Hamed

@hamedmh
Copy link
Author

hamedmh commented Jan 31, 2022

Hi @glenn-jocher,

Now I have a more specific question regarding the same issue (post processing of predictions).
In my train.py , I save the model using: torch.save , then in my detect.py I do the following:

my_model = torch.load(pt_path + "best.pt")
my_model.model.half() if half else my_model.model.float()
source = imgs_path+img_file_name
dataset = LoadImages(source)
for path, im, im0s, vid_cap, s in dataset:
     im = torch.from_numpy(im).to(device)
     im = im.half() if half else im.float()  # uint8 to fp16/32
     im /= 255  # 0 - 255 to 0.0 - 1.0
     im = im.unsqueeze(0)
     output = my_model(im)
     pred = non_max_suppression(output[0])

However, when I print the result:

print(pred)
I get the following:

[tensor([[-3.21432e-01, -1.96759e+00,  3.47019e-01, -1.08958e+00,  4.47312e+01,  0.00000e+00],
         [ 9.95539e-02, -2.22332e+00, -6.70392e-02, -7.91115e-01,  4.30290e+01,  0.00000e+00],
         [-3.16685e-01,  2.95826e-01,  3.56224e-01,  1.16115e+00,  4.05378e+01,  0.00000e+00],
         [ 8.81307e-02,  4.03144e-02, -7.08852e-02,  1.44941e+00,  4.02665e+01,  0.00000e+00],
         [-1.86408e-01, -1.37995e+00, -7.09979e-01, -3.34498e-01,  1.07682e+01,  0.00000e+00],
         [-5.10289e-01, -1.14743e+00, -3.85259e-01, -5.38978e-01,  1.02692e+01,  0.00000e+00],
         [-5.21439e-01,  6.93630e-01, -3.84875e-01,  1.32826e+00,  8.37832e+00,  0.00000e+00],
         [-1.97015e-01,  4.23762e-01, -7.11327e-01,  1.50808e+00,  8.13119e+00,  0.00000e+00],
         [ 2.11624e+00, -1.18973e+00,  1.63665e+00, -1.84288e-01,  4.55044e+00,  0.00000e+00],
         [ 1.78832e+00, -9.54638e-01,  1.95905e+00, -3.72235e-01,  3.56341e+00,  0.00000e+00]], grad_fn=<IndexBackward>)]

My question is: How to understand and visualize this "raw" result? We have only one class (0).

All the best,
Hamed

@glenn-jocher
Copy link
Member

@hamedmh detect.py inference with trained weights is simple:

python detect.py --weights path/to/best.pt

@DeepLearnerYe
Copy link

AutoSha

hi, there. I've got the same question as you, and I don't find a solution. The output is three tensor, how can I transfer them to bounding boxes?

@glenn-jocher
Copy link
Member

@DeepLearnerYe you can post-process the raw model output using the post_process method available in YOLOv5. This method will convert the model output into bounding boxes. Here's a sample implementation:

from models.yolo import Model
from utils.general import non_max_suppression

# Load the model
model = Model('path/to/yolov5s.yaml', ch=3, nc=80)  # Replace with actual config and class number
ckpt = torch.load('path/to/checkpoint.pt')  # Replace with actual checkpoint path
model.load_state_dict(ckpt['model'])

# Perform inference
img = torch.randn(1, 3, 640, 640)  # Replace with actual input image
pred = model(img)

# Post-process the output
pred = non_max_suppression(pred, conf_thres=0.4, iou_thres=0.5)

# Output the bounding boxes
print(pred)

Please replace the placeholder paths and input image with your actual data. Let me know if you encounter any more issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants