Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONNX exported model is outputing Bogus - normalize image to 0..1 values #34

Open
hlacikd opened this issue Jun 17, 2023 · 3 comments
Open

Comments

@hlacikd
Copy link

hlacikd commented Jun 17, 2023

I tried it both with torch 2.0 and torch 1.3 , also messed with different versions of onnx, it all behave same

I am using model edgeyolo_tiny_lrelu trained on my custom dataset for 100epochs with 1 class.

Trained model (best.pth) works with detect.py , and is giving correct results for me.

However when i export it using following command line :

python export.py --onnx-only --weights /workspaces/rocm-ml/edgeyolo-output/train/edgeyolo_lp/best.pth

I have to comment out import # import tensorrt as trt in export.py , and i am getting following warnings :

/workspaces/rocm-ml/python-venv/edgeyolo-cpu/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Reparameterizing models...
/workspaces/rocm-ml/tmp/edgeyolo/edgeyolo/models/yolo.py:963: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if augment:
/workspaces/rocm-ml/tmp/edgeyolo/edgeyolo/models/yolo.py:995: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if profile:
/workspaces/rocm-ml/tmp/edgeyolo/edgeyolo/models/yolo.py:1010: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if profile:
2023-06-17 20:17:26.644 | INFO     | __main__:main:124 - 
start to simplify ONNX...
2023-06-17 20:17:27.221 | INFO     | __main__:main:131 - ONNX export success, saved as output/export/best/640x640_batch1.onnx
2023-06-17 20:17:27.221 | INFO     | __main__:main:178 - All files are saved in output/export/best.

ONNX model is created but its not usable ... does not output anything meaningfull ...

I am trying it using it via following commands in python notebook

import numpy as np
import onnxruntime as rt
import cv2
import torch, torchvision

# init rt
sess = rt.InferenceSession("/workspaces/rocm-ml/tmp/edgeyolo/output/export/best/640x640_batch1.onnx")
input_name = sess.get_inputs()[0].name
output_name = sess.get_outputs()[0].name

# resize to 640x640
original_image: np.ndarray = cv2.imread("/workspaces/rocm-ml/datasets/ds_yolo/valid/images/drive_img_0015.jpg")
[height, width, _] = original_image.shape
length = max((height, width))
image = np.zeros((length, length, 3), np.uint8)
image[0:height, 0:width] = original_image 
scale = length / 640
image = cv2.resize(image, (640, 640))
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# prepare and transpose
image = image / 255.0
image = image.transpose(2, 0, 1)  # HWC -> CHW
# batch
image = np.expand_dims(image, axis=0).astype(np.float32)

# run prediction
pred_onx = sess.run([output_name], {input_name: image})[0]

# pred_onx.shape is correct -> (1, 8400, 6)

then continuing just by reusing your code

# convert it to torch tensor
prediction = torch.tensor(pred_onx)

# boxes and pred
box_corner = prediction.new(prediction.shape)
box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2
box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2
box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2
box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2
prediction[:, :, :4] = box_corner[:, :, :4]

# conf
conf_thre = 0.1
num_classes = 1

# detections
output = [None for _ in range(len(prediction))]
for i, image_pred in enumerate(prediction):
    # If none are remaining => process next image
    if not image_pred.size(0):
        continue
    # Get score and class with highest confidence
    class_conf, class_pred = torch.max(image_pred[:, 5 : 5 + num_classes], 1, keepdim=True)

    conf_mask = (image_pred[:, 4] * class_conf.squeeze() >= conf_thre).squeeze()
    # Detections ordered as (x1, y1, x2, y2, obj_conf, class_conf, class_pred)
    detections = torch.cat((image_pred[:, :5], class_conf, class_pred.float(), image_pred[:, 5 + num_classes :]), 1)
    detections = detections[conf_mask]
    if not detections.size(0):
        continue

and this is where it ends detections array is (0,7) instead of (1,7) which is what "detect.py" in this points returns.
so no detections ...

code is taken from your postprocess function

I can provide notebook best.pth and onnx if you are interested

@hlacikd hlacikd changed the title TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, ONNX exported model is outputing Bogus (TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values) Jun 18, 2023
@hlacikd
Copy link
Author

hlacikd commented Jun 18, 2023

So I have been able to investigate this further and found out , it has to do with not normalizing image to 0..1 float (aka img = img / 255.0) which has its "legacy" flag in detection in your repo.

However i did not found this "flag" in train.py nor training settings cfg file. Could you please help?

I can workaround this issue in ONNX by passing image in 0.255 array, but when i use RKNN Toolkit2 Lite for inference on rockchip platform, i am unable to workaround it, since conversion to normalized image img = img / 255.0 is done internally .

My question is ... how to train edge yolo with "legacy" flag i.e img = img / 255.0

@hlacikd hlacikd changed the title ONNX exported model is outputing Bogus (TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values) ONNX exported model is outputing Bogus - normalize image to 0..1 values Jun 21, 2023
@hlacikd
Copy link
Author

hlacikd commented Jun 21, 2023

I have found out parameter "pixel_range" set to 255 in params , which i suppose can be set via dataset config. However I have found no use for it except in tensorrt which is not my case (i need onnx)

@haiquan-yu
Copy link

The first problem with not being able to generate onnx files is that there is a problem in the author's export.py file,In line 133 of this file,if this parameter which named as arge.rknn don't exist, then would a onnx file without name, so it isn't exist.In short,do this job: change it like this: onnx_file = file_name + "_for_rknn.onnx" if args.rknn else file_name+".onnx".
I hope my answer will be helpful to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants