-
-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ONNX Inference Speed extremely slow compare to .pt Model #4808
Comments
@shrijan00 onnx models run on CPU |
|
@shrijan00 I don’t know, but if you find a good solution make sure to submit a PR to help others run Onnx on GPU! |
@glenn-jocher Why the exported onnx model does not support GPU? |
have you solve the problems? |
in detect.py, change this line so that the code will not install onnxruntime, which is cpu version. make sure you have installed cuda and cudnn to use onnxruntime-gpu |
@callbarian thanks for the pointer! I didn't know about the -gpu package. We should make the requirements check conditional on the hardware then, with GPU-enabled systems installing -gpu automatically. I'll submit a PR for this fix. TODO: Install onnxruntime-gpu automatically if the user has a CUDA-enabled system. |
@callbarian I opened a PR #5087 for this, but testing this PR does not show improved ONNX inference speeds even after installing !python export.py --weights yolov5s.pt --include onnx --dynamic --simplify
!python detect.py --weights yolov5s.onnx
detect: weights=['yolov5s.onnx'], source=data/images, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False
YOLOv5 🚀 v5.0-498-g16f413b torch 1.9.0+cu111 CUDA:0 (Tesla P100-PCIE-16GB, 16280.875MB)
/usr/local/lib/python3.7/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py:353: UserWarning: Deprecation warning. This ORT build has ['CUDAExecutionProvider', 'CPUExecutionProvider'] enabled. The next release (ORT 1.10) will require explicitly setting the providers parameter (as opposed to the current behavior of providers getting set/registered by default based on the build flags) when instantiating InferenceSession.For example, onnxruntime.InferenceSession(..., providers=["CUDAExecutionProvider"], ...)
"based on the build flags) when instantiating InferenceSession."
image 1/2 /content/yolov5/data/images/bus.jpg: 640x640 4 class0s, 1 class5, Done. (0.846s)
image 2/2 /content/yolov5/data/images/zidane.jpg: 640x640 2 class0s, 2 class27s, Done. (0.262s)
Speed: 2.0ms pre-process, 554.0ms inference, 1.8ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/detect/exp3 |
@callbarian can you comment on PR #5087 that adds |
@glenn-jocher From your testing results, it seems extremely bad which cost 554.0ms inference! Therefore the problem has not been fixed~ |
@callbarian have you solved the problem? |
@MrRace ONNX export inference is working correctly with comparable speeds to PyTorch, i.e. see #6963
|
@glenn-jocher Thanks for your reply and I will try with your guide |
@glenn-jocher
|
@MrRace the aforementioned PR is already merged, all of this is in master. If you already have YOLOv5 you don't need to do anything except install dependencies and run the benchmarks: pip install -qr requirements.txt coremltools onnx onnxruntime-gpu openvino-dev # install
pip install -U nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com # TensorRT
python utils/benchmarks.py --weights yolov5s.pt --img 640 --device 0 |
I use the master, the result:
@glenn-jocher Onnx also slower than Pytorch obviously. Therefore could you share the docker environment? |
@MrRace your results look fine. What makes you think that ONNX should be faster than PyTorch? |
@MrRace my previous results are labelled as 'Colab V100 High-RAM Results', but in any case Docker image is also readily available in the README in the Environments section: https://github.com/ultralytics/yolov5#environments |
@glenn-jocher Thanks for your prompt reply ~
|
I've never seen ONNX speedup on GPU for YOLOv5. If you manage any speed improvements though feel free to submit a PR. Please see our ✅ Contributing Guide to get started. |
@MrRace latest GPU results on our server below. I also exported ONNX at
|
@glenn-jocher I also try the half version with
The fp16 TensorRT version just same with float32 TensorRT version above. I check the code and find in export.py
Even I do not set |
@MrRace yes that's correct! TRT is pinned to FP16 as we saw no observable benefit to FP32 TRT exports. |
@glenn-jocher I use same model and test data for
Have you ever encountered this problem @glenn-jocher ? |
@MrRace 👋 hi, thanks for letting us know about this possible problem with YOLOv5 🚀. We've created a few short guidelines below to help users provide what we need in order to start investigating a possible problem. How to create a Minimal, Reproducible ExampleWhen asking a question, people will be better able to provide help if you provide code that they can easily understand and use to reproduce the problem. This is referred to by community members as creating a minimum reproducible example. Your code that reproduces the problem should be:
For Ultralytics to provide assistance your code should also be:
If you believe your problem meets all the above criteria, please close this issue and raise a new one using the 🐛 Bug Report template with a minimum reproducible example to help us better understand and diagnose your problem. Thank you! 😃 |
Running the model on Colab with a P100 GPU, I have the following results:
Which shows that in running benchmarks the ONNX, while slower is still comparable. However when running the detection script, the results of of inference with OpenCV DNN is dramatically slower.
@glenn-jocher Is the speed of OpenCV dnn inference the expected normal speed? Is there any way to improve it? |
@armap94 --dnn inference is likely using CPU. I'm not very familiar with DNN, but if you'd like to submit a PR for DNN inference that would be useful. The relevant code area is here: Loading: Lines 355 to 358 in 2373d54
Inference: Lines 507 to 510 in 2373d54
|
But when the |
@armap94 inference device is independent of export device. |
Hi,
![image](https://user-images.githubusercontent.com/4169231/133449052-ee139311-8025-4727-b5b6-6545f6adfbec.png)
I tried to inference an image of resolution 1024*1536 using onnx and .pt model
As you can see the huge time difference between the 2 cases in the image
Any reason for this?
The text was updated successfully, but these errors were encountered: