Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use export.py to generate yolov5s.onnx will get a negative number. #343

Closed
cmdbug opened this issue Jul 10, 2020 · 74 comments
Closed

Use export.py to generate yolov5s.onnx will get a negative number. #343

cmdbug opened this issue Jul 10, 2020 · 74 comments
Labels
question Further information is requested

Comments

@cmdbug
Copy link

cmdbug commented Jul 10, 2020

❔Question

Use export.py to generate yolov5s.onnx will get a negative number.
image
image

This is the code that executes the onnx part

session = onnxruntime.InferenceSession('./weights/yolov5s.onnx')

batch_size = session.get_inputs()[0].shape[0]
img_size_h = session.get_inputs()[0].shape[2]
img_size_w = session.get_inputs()[0].shape[3]

image_src = Image.open(image_path)
resized = letterbox_image(image_src, (img_size_w, img_size_h))
img_in = np.transpose(resized, (2, 0, 1)).astype(np.float32)  # HWC -> CHW
img_in = np.expand_dims(img_in, axis=0)
img_in /= 255.0

input_name = session.get_inputs()[0].name
# output, output_exist = session.run(['decoder.output_conv', 'lane_exist.linear2'], {"input.1": image_np})
outputs = session.run(None, {input_name: img_in})

There are already negative numbers in the outputs. Then after the result is processed, it will appear that part of it is correct, such as car in the figure. However, the top/bottom of the bicycle is right, the left/right is wrong, the left/right of the dog is right, and the top/bottom is wrong. What might cause this problem? Thanks.

Additional context

torch:1.5.1
torchvision:0.6.1
onnxruntime:1.3.0

@cmdbug cmdbug added the question Further information is requested label Jul 10, 2020
@github-actions
Copy link
Contributor

github-actions bot commented Jul 10, 2020

Hello @WZTENG, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook Open In Colab, Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

  • Cloud-based AI systems operating on hundreds of HD video streams in realtime.
  • Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
  • Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

@cmdbug cmdbug closed this as completed Jul 11, 2020
@hxk11111
Copy link

Have you solved the problem? I have the same question here

@cmdbug

This comment has been minimized.

@dlawrences
Copy link
Contributor

dlawrences commented Jul 14, 2020

Hi both,

It very much seems that the above script generates the results as per the three raw output layers:

  • (1, 3, 20, 20, 85)
  • (1, 3, 40, 40, 85)
  • (1, 3, 80, 80, 85)

These results are not final. In the detect.py script these are also processed during inference:

yolov5/models/yolo.py

Lines 29 to 36 in a1c8406

if not self.training: # inference
if self.grid[i].shape[2:4] != x[i].shape[2:4]:
self.grid[i] = self._make_grid(nx, ny).to(x[i].device)
y = x[i].sigmoid()
y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i].to(x[i].device)) * self.stride[i] # xy
y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i] # wh
z.append(y.view(bs, -1, self.no))

You have two options:

  1. Change export.py to include the Detect layer:

model.model[-1].export = True # set Detect() layer export=True

Above needs to be changed to false.

I have done this for my own experiments. The ONNX export seems to work, however the CoreML one doesn't.

  1. Create logic to replicate inference steps in Detect layer

You could replicate the same logic that's referenced above using numpy (i.e. pass the results through sigmoid and do all the handling).

In both cases, you do miss the following:

  • filtering results with objectness lower than some threshold
  • NMS
  • conversion from xc, yc, w, h to x1, y1, x2, y2

These are currently done as part of the following function:

yolov5/utils/utils.py

Lines 549 to 554 in a1c8406

def non_max_suppression(prediction, conf_thres=0.1, iou_thres=0.6, merge=False, classes=None, agnostic=False):
"""Performs Non-Maximum Suppression (NMS) on inference results
Returns:
detections with shape: nx6 (x1, y1, x2, y2, conf, cls)
"""

Hope this is useful. Good luck!

@cmdbug
Copy link
Author

cmdbug commented Jul 14, 2020

thanks!

@cmdbug
Copy link
Author

cmdbug commented Jul 14, 2020

@dlawrences model.model[-1].export = False
After modifying it to false, there is still a problem with generating .onnx parsing. Can you refer to the code for parsing onnx? Thank you!

@dlawrences
Copy link
Contributor

@WZTENG what is the error you are encountering?

@cmdbug
Copy link
Author

cmdbug commented Jul 14, 2020

image
model.model[-1].export = False
After modifying it to false, the result is still problematic, and it feels worse than before.

@dlawrences
Copy link
Contributor

I understand why you're saying it feels worse than before, but that it is because you are now missing any NMS step, as specified above. Could you please answer/check on the following points?

  • I see you are training the COCO dataset: have you generated your own custom anchors or are you running on those provided by @glenn-jocher ?
  • Please share the training results of your yolo5s model; I am fairly interested in how many epochs you have trained for
  • Please share the hyperparams you have used during training
  • Please run the same image through detect.py and show the result

Also, I would like to have a look at the .onnx file, just to make sure there's nothing wrong with you. Would you please attach it here?

Thanks

@cmdbug
Copy link
Author

cmdbug commented Jul 15, 2020

Use the official yolov5s.pt file to convert it. Not trained by myself.
model.model[-1].export = False ---> yolov5s.onnx
The result of detect.py operation is correct.
image
before:
anchors = [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]] # 5s
image
after:
anchors = [[116, 90, 156, 198, 373, 326], [30, 61, 62, 45, 59, 119], [10, 13, 16, 30, 33, 23]] # 5s
image
image
After modifying the anchors sequence, I found that some parts are normal, but there are still problems with other pictures.

@cmdbug
Copy link
Author

cmdbug commented Jul 15, 2020

Finally succeeded. It is estimated that some of the internal processing methods are different from what I thought before. If I have time, I will carefully check the internal processing process. The current successful way is to set model.model[-1].export = False, and use the output [0] and call the official NMS to display it correctly. Previously, I used the results with 3 outputs and processed the relevant content myself. However, thank you for providing useful information.

@dlawrences
Copy link
Contributor

Hi @WZTENG

Great news, happy you have managed to do it! I think it would be really useful for others to create a mini-documentation containing your findings. Would you be willing to put together this info?

CC @glenn-jocher

Tip: It should however be possible to process the output of the three feature maps independently from the Detect layer by replicating all those operations in NumPy/any other framework though. I have managed to do it myself using CoreML ops.

Cheers

@dlawrences
Copy link
Contributor

dlawrences commented Jul 15, 2020

Additional info: I am not sure what you mean by "and use the output [0]", but if you are only consuming the results as per the higher level feature map (80x80), then you are missing on some results.

Please consider that the Detect layer, at least as per my memory, produces outputs scaled to the dimensions of the input image (i.e. the original image may be 1920x1080, but you have trained using 640x640 inputs, so this is the dimension space). In detect.py, there is some logic that handles this. Namely:

  • results from the three feature maps are scaled back to the initial image shape (im0); this also takes into account any letterboxing and handles the right padding

    yolov5/detect.py

    Lines 83 to 85 in a040500

    if det is not None and len(det):
    # Rescale boxes from img_size to im0 size
    det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()

  • results are converted to x, y, w, h and normalized:

    xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist() # normalized xywh

These operations are not done in the Detect layer, but as part of the post-processing (even after NMS).

@cmdbug
Copy link
Author

cmdbug commented Jul 15, 2020

It is only successful now, but I am still not sure what caused it. If it can be solved, I will send it out.

@cmdbug
Copy link
Author

cmdbug commented Jul 16, 2020

It feels a bit different from the data I have seen before, but the output is no problem. The following is the method I implemented, although there are more, but at least I understand how to deal with it. I hope that those who have encountered this problem can refer to it.
This is the analysis method I wrote myself. Please note that the parameters of model.model[-1].export = BOOL have a big difference during export.

def detect_onnx(official=True, image_path=None):
    num_classes = 80
    anchors = [[116, 90, 156, 198, 373, 326], [30, 61, 62, 45, 59, 119], [10, 13, 16, 30, 33, 23]]  # 5s

    session = onnxruntime.InferenceSession('./weights/yolov5s.onnx')
    # print("The model expects input shape: ", session.get_inputs()[0].shape)
    batch_size = session.get_inputs()[0].shape[0]
    img_size_h = session.get_inputs()[0].shape[2]
    img_size_w = session.get_inputs()[0].shape[3]

    # input
    image_src = Image.open(image_path)
    resized = letterbox_image(image_src, (img_size_w, img_size_h))

    img_in = np.transpose(resized, (2, 0, 1)).astype(np.float32)  # HWC -> CHW
    img_in = np.expand_dims(img_in, axis=0)
    img_in /= 255.0
    # print("Shape of the image input shape: ", img_in.shape)

    # inference
    input_name = session.get_inputs()[0].name
    outputs = session.run(None, {input_name: img_in})

    batch_detections = []
    if official and len(outputs) == 4:   # model.model[-1].export = boolean ---> True:3 False:4
        # model.model[-1].export = False ---> outputs[0] (1, xxxx, 85)
        # official
        batch_detections = torch.from_numpy(np.array(outputs[0]))
        batch_detections = non_max_suppression(batch_detections, conf_thres=0.4, iou_thres=0.5, agnostic=False)
    else:
        # model.model[-1].export = False ---> outputs[1]/outputs[2]/outputs[2]
        # model.model[-1].export = True  ---> outputs
        # (1, 3, 20, 20, 85)
        # (1, 3, 40, 40, 85)
        # (1, 3, 80, 80, 85)
        # myself (from yolo.py Detect)
        boxs = []
        a = torch.tensor(anchors).float().view(3, -1, 2)
        anchor_grid = a.clone().view(3, 1, -1, 1, 1, 2)
        if len(outputs) == 4:
            outputs = [outputs[1], outputs[2], outputs[3]]
        for index, out in enumerate(outputs):
            out = torch.from_numpy(out)
            batch = out.shape[1]
            feature_w = out.shape[2]
            feature_h = out.shape[3]

            # Feature map corresponds to the original image zoom factor
            stride_w = int(img_size_w / feature_w)
            stride_h = int(img_size_h / feature_h)

            conf = out[..., 4]
            pred_cls = out[..., 5:]

            grid_x, grid_y = np.meshgrid(np.arange(feature_w), np.arange(feature_h))

            # cx, cy, w, h
            pred_boxes = torch.FloatTensor(out[..., :4].shape)
            pred_boxes[..., 0] = (torch.sigmoid(out[..., 0]) * 2.0 - 0.5 + grid_x) * stride_w  # cx
            pred_boxes[..., 1] = (torch.sigmoid(out[..., 1]) * 2.0 - 0.5 + grid_y) * stride_h  # cy
            pred_boxes[..., 2:4] = (torch.sigmoid(out[..., 2:4]) * 2) ** 2 * anchor_grid[index]  # wh

            conf = torch.sigmoid(conf)
            pred_cls = torch.sigmoid(pred_cls)

            output = torch.cat((pred_boxes.view(batch_size, -1, 4),
                                conf.view(batch_size, -1, 1),
                                pred_cls.view(batch_size, -1, num_classes)),
                               -1)
            boxs.append(output)

        outputx = torch.cat(boxs, 1)
        # NMS
        batch_detections = w_non_max_suppression(outputx, num_classes, conf_thres=0.4, nms_thres=0.3)

    return batch_detections

If necessary, you can change all the methods used to numpy to achieve better, which is convenient for other frameworks.

@cmdbug cmdbug reopened this Jul 16, 2020
@cmdbug cmdbug closed this as completed Jul 16, 2020
@mukul1em
Copy link

@WZTENG can you share the whole code for onnx inference?
also what is w_non_max_suppression()

@imyoungyang
Copy link

@WZTENG
thank you very much for your works and test.
Could you share your code for the function w_non_max_suppression?

Thank you again.

@cmdbug
Copy link
Author

cmdbug commented Jul 21, 2020

@mukul1em @imyoungyang here. download demo_onnx.zip and unzip.
demo_onnx.zip

@cmdbug
Copy link
Author

cmdbug commented Jul 22, 2020

yolov3/v4

image

yolov5

image

@china56321
Copy link

@mukul1em @imyoungyang here. download demo_onnx.zip and unzip.
demo_onnx.zip

Can it be used to test whether the converted best.onnx is OK?

@cmdbug
Copy link
Author

cmdbug commented Jul 29, 2020

@mukul1em @imyoungyang here. download demo_onnx.zip and unzip.
demo_onnx.zip

Can it be used to test whether the converted best.onnx is OK?

Yes, but I only wrote the 5s part of the code, other models need to add the corresponding anchors part. Note the order of anchors.

@china56321
Copy link

@mukul1em @imyoungyang here. download demo_onnx.zip and unzip.
demo_onnx.zip

Can it be used to test whether the converted best.onnx is OK?

Yes, but I only wrote the 5s part of the code, other models need to add the corresponding anchors part. Note the order of anchors.

I use your code te test yolov5s.onnx ,but something eror happened:
File "demo_onnx.py", line 306, in
detections = detect_onnx(official=False, image_path=image_path)
File "demo_onnx.py", line 234, in detect_onnx
pred_boxes[..., 0] = (torch.sigmoid(out[..., 0]) * 2.0 - 0.5 + grid_x) * stride_w # cx
TypeError: add(): argument 'other' (position 1) must be Tensor, not numpy.ndarray

@cmdbug
Copy link
Author

cmdbug commented Jul 29, 2020

@mukul1em @imyoungyang here. download demo_onnx.zip and unzip.
demo_onnx.zip

Can it be used to test whether the converted best.onnx is OK?

Yes, but I only wrote the 5s part of the code, other models need to add the corresponding anchors part. Note the order of anchors.

I use your code te test yolov5s.onnx ,but something eror happened:
File "demo_onnx.py", line 306, in
detections = detect_onnx(official=False, image_path=image_path)
File "demo_onnx.py", line 234, in detect_onnx
pred_boxes[..., 0] = (torch.sigmoid(out[..., 0]) * 2.0 - 0.5 + grid_x) * stride_w # cx
TypeError: add(): argument 'other' (position 1) must be Tensor, not numpy.ndarray

It works normally for me. Did you modify the code? You can also use the conversion function for conversion.
numpy->tensor: torch.from_numpy(numpy array)
tensor->numpy: tensor array .numpy()
The code of this zip is yolov5_v1.x version, not yolov5_v2 version.

@china56321
Copy link

china56321 commented Jul 29, 2020

@mukul1em @imyoungyang here. download demo_onnx.zip and unzip.
demo_onnx.zip

Can it be used to test whether the converted best.onnx is OK?

Yes, but I only wrote the 5s part of the code, other models need to add the corresponding anchors part. Note the order of anchors.

I use your code te test yolov5s.onnx ,but something eror happened:
File "demo_onnx.py", line 306, in
detections = detect_onnx(official=False, image_path=image_path)
File "demo_onnx.py", line 234, in detect_onnx
pred_boxes[..., 0] = (torch.sigmoid(out[..., 0]) * 2.0 - 0.5 + grid_x) * stride_w # cx
TypeError: add(): argument 'other' (position 1) must be Tensor, not numpy.ndarray

It works normally for me. Did you modify the code? You can also use the conversion function for conversion.
numpy->tensor: torch.from_numpy(numpy array)
tensor->numpy: tensor array .numpy()
The code of this zip is yolov5_v1.x version, not yolov5_v2 version.

NO ,i modify nothing. Could you upload your yolov5s.onnx or other onnx so as to check whether the onnx that converted is not correct ?

@wjjlisa
Copy link

wjjlisa commented Feb 2, 2021

@mukul1em @imyoungyang here. download demo_onnx.zip and unzip.
demo_onnx.zip

Does it only support square size (eg,640_640) ? What should i do if i want to change the onnx input size(eg,640_320 or other size ) ?

320 is always ok!

I mean 640_320 ,not 640_640

you can try,i think it's ok

Yes ,i have tried it ,the box is wrong ,all are shifted, Can you sovle it ?

I met the same problem. The input image after lettorbox size is (640,480), but the onnx model expected input is (640,640). Have you solved the problem?

@gohguodong
Copy link

i unzipped and ran the demo_onnx.py.

i encountered the error below:

OSError: [WinError 127] The specified procedure could not be found. Error loading "C:\Users\User\anaconda3\envs\yolov5\lib\site-packages\torch\lib\cublas64_11.dll" or one of its dependencies.

may i know what could be the problem?

@gohguodong
Copy link

Hi, i tried running the demo_onnx.py. i didn't change anything else except for the number of classes and the anchors.

below is the error i got. any idea what the issue is?

Traceback (most recent call last):
File "demo_onnx.py", line 318, in
detections = detect_onnx(official=False, image_path=image_path)
File "demo_onnx.py", line 253, in detect_onnx
output = torch.cat((pred_boxes.view(batch_size, -1, 4),
RuntimeError: Sizes of tensors must match except in dimension 2. Got 19200 and 16000 in dimension 1 (The offending index is 2)

@glenn-jocher
Copy link
Member

@gohguodong sorry this code you ran is not part of the official repo code so we can't provide support for it. We are working on providing better inference examples for deployment environments in the future though!

@lleye
Copy link

lleye commented May 3, 2021

@glenn-jocher any updates on this?

@glenn-jocher
Copy link
Member

@lleye there are constant updates to export.

@pocketpixels
Copy link

@glenn-jocher Is someone currently working on improving the CoreML export? It would be great if it could generate a complete object detection model (including the Detect layers feeding into CoreML's NMS layer) that then could be used easily with Apple's Vision framework. I would be happy to help with this (to the best of my abilities).

@berkozsoy96
Copy link

berkozsoy96 commented Jul 14, 2021

@mukul1em @imyoungyang here. download demo_onnx.zip and unzip.
demo_onnx.zip

Does it only support square size (eg,640_640) ? What should i do if i want to change the onnx input size(eg,640_320 or other size ) ?

320 is always ok!

I mean 640_320 ,not 640_640

you can try,i think it's ok

Yes ,i have tried it ,the box is wrong ,all are shifted, Can you sovle it ?

I met the same problem. The input image after lettorbox size is (640,480), but the onnx model expected input is (640,640). Have you solved the problem?

You have to shift the boxes int(pad*(1/scale)) pixels
In plot_one_box function you sould write:

pad = int(pad*(1/scale))
h, w = image.shape[:2]
if h > w:
    c1, c2 = (int(x[0])-pad, int(x[1])), (int(x[2])-pad, int(x[3]))
else:
    c1, c2 = (int(x[0]), int(x[1])-pad), (int(x[2]), int(x[3])-pad)

image is your orijinal image

@tcollins590
Copy link

@glenn-jocher Is someone currently working on improving the CoreML export? It would be great if it could generate a complete object detection model (including the Detect layers feeding into CoreML's NMS layer) that then could be used easily with Apple's Vision framework. I would be happy to help with this (to the best of my abilities).

Were you ever able to use an exported coreml model? I'm struggling to figure out how to export correctly.

@pocketpixels
Copy link

@tylercollins590 Yes, I was. See my reply here.

@tcollins590
Copy link

@pocketpixels thank you. Does this work on V6 of Yolov5 or a previous version?

@pocketpixels
Copy link

@tylercollins590 Not sure. I haven't tested with the latest version. You can find my fork with this script here. Note that I made my changes in the branch better_coreml_export.

@tcollins590
Copy link

@pocketpixels got it exported and working in my iOS app. I'm seeing very low FPS (5-7fps) with Yolov5s. Did you experience similar?

@pocketpixels
Copy link

pocketpixels commented Oct 18, 2021

Performance will depend on the device and the image resolution you use.

@tcollins590
Copy link

@pocketpixels I am using an iphone 12 pro. iDetection from ultralytics runs Yolov5s at about 35 FPS. I am using 640x640 resolution in my exported model. Any ideas where I should focus?

@pocketpixels
Copy link

I am not using it for continuous detection myself. And wouldn’t really recommend that either as iOS devices will burn through battery and will throttle down pretty quickly if you max out the GPU and/or Neural Engine and CPU.
The iDetection app uses a lower resolution.

@pocketpixels
Copy link

IDetection uses 320x192. So about 7x less pixels. So the performance you are seeing seems reasonable.

@mshamash
Copy link

@tylercollins590 did you get the script by @pocketpixels to work on v6 (or 6.1) of YOLOv5? Managed to work around some numpy syntax changes (line 59) however am running into issues with arrays shapes, a bit beyond me...

Fusing layers... 
Model summary: 213 layers, 7012822 parameters, 0 gradients, 15.8 GFLOPs
terminate called after throwing an instance of 'std::runtime_error'
  what():  shape '[1, 3, 1, 1, 2]' is invalid for input of size 38400
Aborted (core dumped)

@tcollins590
Copy link

@mshamash unfortunately I have not been able to get anything to run on any version past v4

@mshamash
Copy link

@tylercollins590 Interesting, as I did get that script to work on my v5 YOLOv5s models in late-2021 (final CoreML models had NMS)

@mshamash
Copy link

@mshamash unfortunately I have not been able to get anything to run on any version past v4

@tylercollins590 - After much effort, and trial and error, I got the .mlmodel files from the YOLOv5 (v6.1) export.py script to work within my iOS app on still images. If this is something you are also doing with your models I'd be happy to share the code.
I'll be making a new repo with an entire sample project soon, as (surprisingly) there's no good public, simple, example for using YOLOv5 with CoreML. At least none that I could find or use...

@mshamash
Copy link

mshamash commented Apr 4, 2022

@tylercollins590 - I submitted a pull request #7263 which has an updated export.py script so that the exported CoreML model has a NMS layer. Give it a try with your models!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests