Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

YOLOv5 in LibTorch produce different results #312

Closed
zherlock030 opened this issue Jul 6, 2020 · 23 comments
Closed

YOLOv5 in LibTorch produce different results #312

zherlock030 opened this issue Jul 6, 2020 · 23 comments
Labels

Comments

@zherlock030
Copy link

馃悰 Bug

A clear and concise description of what the bug is.

I followed https://gist.github.com/jakepoz/eb36163814a8f1b6ceb31e8addbba270 to derive the script model.

In my C++ code and my python code, I tested the same picture, I checked that the input tensors were the same after pre-processing of the picture, but the model output is different.

To Reproduce (REQUIRED)

the picture shape is (channel = 3, height = 360, width = 640)

python Input:

import cv2
img_path = 'test.png'
img = cv2.imread(img_path)
img = letterbox(img, new_shape = (640,640))[0]
img = img[:, :, ::-1].transpose(2, 0, 1)  # BGR to RGB
img = np.ascontiguousarray(img)
img = torch.from_numpy(img).to(device).float()
img /= 255.0  # 0 - 255 to 0.0 - 1.0
if img.ndimension() == 3:
    img = img.unsqueeze(0)#shape(1,3,384,640)
pred = model(img, augment=False)
print(pred[0].shape)

Python Output:

torch.Size([1, 15120, 85])

C++ input

string img_path = "test.png";
  Mat img = imread(img_path);
  img = letterbox(img);//resize
  cvtColor(img, img, CV_BGR2RGB);// bgr->rgb
  img.convertTo(img, CV_32FC3, 1.0f / 255.0f);// 1/255
  auto tensor_img = torch::from_blob(img.data, {img.rows, img.cols, img.channels()});
  tensor_img = tensor_img.permute({2, 0, 1});
  tensor_img = tensor_img.unsqueeze(0);
  cout << "line 111, tensor size is " << tensor_img.sizes() << endl;//(1,3,384,640)
  
  std::vector<torch::jit::IValue> inputs;
  inputs.push_back(tensor_img);
  torch::jit::IValue output = model.forward(inputs);

  auto op = output.toList().get(0).toTensor();
 
  cout << "line 133, op[0] is " << op.sizes() << endl;

C++ output

output tensor shape  [1, 3, 48, 80, 85],
and 3*48*80 = 11520 != 15120

Expected behavior

A clear and concise description of what you expected to happen.

I would wish the model output in C++ will be the same as it in python.

Environment

  • OS: [Ubuntu]
  • CPU
@zherlock030 zherlock030 added the bug Something isn't working label Jul 6, 2020
@github-actions
Copy link
Contributor

github-actions bot commented Jul 6, 2020

Hello @zherlock030, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook Open In Colab, Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

  • Cloud-based AI systems operating on hundreds of HD video streams in realtime.
  • Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
  • Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

@zherlock030
Copy link
Author

@jakepoz

@glenn-jocher glenn-jocher removed the bug Something isn't working label Jul 6, 2020
@glenn-jocher
Copy link
Member

glenn-jocher commented Jul 6, 2020

We have updated export.py to support torchscript export now, among others. The tutorial is here: https://docs.ultralytics.com/yolov5/tutorials/model_export

Note that these are simple examples to get you started. Actual export and deployment (to an edge device) for example is a very complicated journey. We have not open sourced the entire process, but we do offer paid support in this area. If you have a business need let us know and we'd be happy to help you!

@jakepoz
Copy link
Contributor

jakepoz commented Jul 7, 2020

@zherlock030, this is because the final Detect layer in yolov5 is undoing the action of yolo's "anchor" system when in regular operations, but this is not being exported in the export script:

pjreddie/darknet#568

Unfortunately, I have not yet figured out the details here, it seems as if some of the variables like the self.anchors and self.anchor_grid are stored as registered parameters, but self.strides is not, and I have difficulty exporting the model with the anchor code turned on.

@winself
Copy link

winself commented Jul 23, 2020

@zherlock030 @jakepoz do you solve solve the problem? I meet the same. looking forward to your reply. thank you .

@zherlock030
Copy link
Author

zherlock030 commented Jul 24, 2020

@zherlock030 @jakepoz do you solve solve the problem? I meet the same. looking forward to your reply. thank you .

@winself
Think I have made it. Im not sure if I should open source it since @glenn-jocher have his concern.
I could share my code with u.

@zherlock030
Copy link
Author

zherlock030 commented Jul 24, 2020

@zherlock030, this is because the final Detect layer in yolov5 is undoing the action of yolo's "anchor" system when in regular operations, but this is not being exported in the export script:

pjreddie/darknet#568

Unfortunately, I have not yet figured out the details here, it seems as if some of the variables like the self.anchors and self.anchor_grid are stored as registered parameters, but self.strides is not, and I have difficulty exporting the model with the anchor code turned on.

@jakepoz
Thanks for your reply. I just treats self.strides as constants. And for now it produces reasonable results as it does in python.

@zherlock030
Copy link
Author

We have updated export.py to support torchscript export now, among others. The tutorial is here: #251

Note that these are simple examples to get you started. Actual export and deployment (to an edge device) for example is a very complicated journey. We have not open sourced the entire process, but we do offer paid support in this area. If you have a business need let us know and we'd be happy to help you!

Thanks for ur reply. Think I have made it, yolov5s is so fast.

@glenn-jocher
Copy link
Member

@zherlock030 hi no worries about open sourcing your work! The only requirement is that you retain the current GPL3 license on modifications.

We eventually want to open source 100% of everything, including the export pipelines and the iDetection iOS app source code. We are trying to adjust our business model to make this happen either later this year or next year.

@winself
Copy link

winself commented Jul 27, 2020

@zherlock030 Thanks for ur reply!!! " self.training |= self.export" cause this results. when export is True -> training is True. so the Torchscript product the training output. we need write some code to process the result . Is this right ?

@zherlock030
Copy link
Author

@zherlock030 Thanks for ur reply!!! " self.training |= self.export" cause this results. when export is True -> training is True. so the Torchscript product the training output. we need write some code to process the result . Is this right ?

yeah, we need to write code for image preprocess, detect layer and nms.
U can see my implementation in https://github.com/zherlock030/YOLOv5_Torchscript.

@phamdat09
Copy link

Hello,
I also interested in running Yolov5 in C++. @zherlock030 when you run yolov5, how many GB of GPU do you use? Is it lower than running in python?
Thanks

@easycome2009
Copy link

@zherlock030,Hi!! It's similar to you that I also write the nms.cpp code. I used the official export.py to export the torchscript files,the output op is a tensor of [1,gridx, gridy, 9], however, the 9 vector is totally wrong. Does the exported torchscript files is not right?? Do I need any modificaton? Because I find the warnning words :TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. in the torch.jit.trace

@yasenh
Copy link

yasenh commented Jul 30, 2020

@easycome2009 what I did is set "model.model[-1].export = False" in exprot.py line#28, I get similar result from python and c++

@zherlock030
Copy link
Author

@phamdat09 hi, actually Im using a CPU.

@zherlock030
Copy link
Author

@easycome2009 yes, when u run export.py, u need to modify the detect layer, let it just output the imputed list 'x', and then implement detect layer in ur c++ code.

@zherlock030
Copy link
Author

@yasenh yes, I tried that too, that way we can't feed the network pictures in different shapes.

@yasenh
Copy link

yasenh commented Aug 5, 2020

@zherlock030, here is my implementation just FYI: https://github.com/yasenh/libtorch-yolov5
The image will be padded to fix size e.g (640, 640)

@zherlock030
Copy link
Author

@yasenh yeah I know what u mean, but actually with function letterbox, image in any shape can be feeded to yolo.

@yasenh
Copy link

yasenh commented Aug 5, 2020

@yasenh yeah I know what u mean, but actually with function letterbox, image in any shape can be feeded to yolo.

I think you can still do that, but I think the benefit of padding images to same size is that we can process images as a batch. Otherwise you might need to process images with different sizes one by one.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 5, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@FantasyJXF
Copy link

C++ output

output tensor shape  [1, 3, 48, 80, 85],
and 3*48*80 = 11520 != 15120

That's because the c++ output is a list [(1, 3, height / 8, width / 8, 6), (1, 3, height / 16, width / 16, 6), (1, 3, height / 16, width / 16, 6)], while the python output is a tuple ([1, num_anchors, 6], [(1, 3, height / 8, width / 8, 6), (1, 3, height / 16, width / 16, 6), (1, 3, height / 16, width / 16, 6)]).

In your case: 3*(48*80+24*40+12*20) == 15120

@MHGL
Copy link

MHGL commented Jul 23, 2021

our team rewrite yolov5l model, now it can convert to (script, onnx, coreml, ncnn, tnn, mnn, tensorrt), please refer to yolov5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants