Gain of `EfficientNMS_TRT` embed on TensorRT #6430

zhiqwang · 2022-01-26T05:28:35Z

Search before asking

I have searched the YOLOv5 issues and found no similar feature requests.

Description

Hi YOLOv5 Community,

I wanna share with you our gains on TensorRT with a new pipeline to deal with YOLOv5, we embed the whole post-processing (namely the EfficientNMS_TRT plugin) into the Graph with onnx-graghsurgeon. The ablation experiment results are below. The first one is the result without running EfficientNMS_TRT, and the second one is the result with EfficientNMS_TRT embedded. As you can see, the inference time is even reduced, we guess it is because the data copied to the device will be much less after doing EfficientNMS_TRT. (The mean Latency of D2H is reduced from 0.868048 ms to 0.0102295 ms, running on Nivdia Geforce GTX 1080ti, using TensorRT 8.2 with yolov5n6 and scaling images to 512x640.) Check https://zhiqwang.com/yolov5-rt-stack/notebooks/onnx-graphsurgeon-inference-tensorrt.html for more details.

Use case

Deploy YOLOv5 quickly on TensorRT, and in this way, you do not need to write any C++ codes for post-processing.

Known cons

We have to update the TensorRT to 8.2 to call the EfficientNMS_TRT plugin. And seems that there is a bug about the float16 of this plugin: NVIDIA/TensorRT#1758 (comment).

Related Issue

Add nms and agnostic nms to export.py #5938 (comment)

Additional

And onnx-graphsurgeon is easy to install, you can just use their prebuilt wheels:

python3 -m pip install onnx_graphsurgeon --index-url https://pypi.ngc.nvidia.com

The detailed results:

[I] === Performance summary w/o EfficientNMS_TRT plugin ===
[I] Throughput: 383.298 qps
[I] Latency: min = 3.66479 ms, max = 5.41199 ms, mean = 4.00543 ms, median = 3.99316 ms, percentile(99%) = 4.23831 ms
[I] End-to-End Host Latency: min = 3.76599 ms, max = 6.45874 ms, mean = 5.08597 ms, median = 5.07544 ms, percentile(99%) = 5.50839 ms
[I] Enqueue Time: min = 0.743408 ms, max = 5.27966 ms, mean = 0.940805 ms, median = 0.924805 ms, percentile(99%) = 1.37329 ms
[I] H2D Latency: min = 0.502045 ms, max = 0.62674 ms, mean = 0.538255 ms, median = 0.537354 ms, percentile(99%) = 0.582153 ms
[I] GPU Compute Time: min = 2.23233 ms, max = 3.92395 ms, mean = 2.59913 ms, median = 2.58661 ms, percentile(99%) = 2.8201 ms
[I] D2H Latency: min = 0.851807 ms, max = 0.900421 ms, mean = 0.868048 ms, median = 0.867676 ms, percentile(99%) = 0.889191 ms
[I] Total Host Walltime: 3.0081 s
[I] Total GPU Compute Time: 2.99679 s
[I] Explanations of the performance metrics are printed in the verbose logs.
[I]
&&&& PASSED TensorRT.trtexec [TensorRT v8201] # trtexec --onnx=yolov5n6-no-nms.onnx --workspace=8096

[I] === Performance summary w/ EfficientNMS_TRT plugin ===
[I] Throughput: 389.234 qps
[I] Latency: min = 2.81482 ms, max = 9.77234 ms, mean = 3.1062 ms, median = 3.07642 ms, percentile(99%) = 3.33548 ms
[I] End-to-End Host Latency: min = 2.82202 ms, max = 11.6749 ms, mean = 4.939 ms, median = 4.95587 ms, percentile(99%) = 5.45207 ms
[I] Enqueue Time: min = 0.999878 ms, max = 11.3833 ms, mean = 1.28942 ms, median = 1.18579 ms, percentile(99%) = 4.53088 ms
[I] H2D Latency: min = 0.488159 ms, max = 0.633881 ms, mean = 0.546754 ms, median = 0.546631 ms, percentile(99%) = 0.570557 ms
[I] GPU Compute Time: min = 2.30298 ms, max = 9.21094 ms, mean = 2.54921 ms, median = 2.51904 ms, percentile(99%) = 2.78528 ms
[I] D2H Latency: min = 0.00610352 ms, max = 0.302734 ms, mean = 0.0102295 ms, median = 0.00976562 ms, percentile(99%) = 0.0151367 ms
[I] Total Host Walltime: 3.00591 s
[I] Total GPU Compute Time: 2.98258 s
[I] Explanations of the performance metrics are printed in the verbose logs.
[I]
&&&& PASSED TensorRT.trtexec [TensorRT v8201] # trtexec --onnx=yolov5n6-efficient-nms.onnx --workspace=8096

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

The text was updated successfully, but these errors were encountered:

github-actions · 2022-03-01T00:18:05Z

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Wiki – https://github.com/ultralytics/yolov5/wiki
Tutorials – https://docs.ultralytics.com/yolov5
Docs – https://docs.ultralytics.com

Access additional Ultralytics ⚡ resources:

Ultralytics HUB – https://ultralytics.com/hub
Vision API – https://ultralytics.com/yolov5
About Us – https://ultralytics.com/about
Join Our Team – https://ultralytics.com/work
Contact Us – https://ultralytics.com/contact

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

zhiqwang added the enhancement New feature or request label Jan 26, 2022

github-actions bot added the Stale label Mar 1, 2022

github-actions bot closed this as completed Mar 7, 2022

zhiqwang mentioned this issue Apr 11, 2022

Exporting to ONNX, including the NMS #7373

Closed

1 task

triple-Mu mentioned this issue May 9, 2022

Add nms for tensorrt8.0+ / onnxruntime / openvino(the same way as onnxruntime) #7736

Closed

zhiqwang mentioned this issue May 20, 2022

C++ inference pipeline for TensorRT #7892

Closed

zhiqwang mentioned this issue Aug 1, 2022

TFLite, ONNX, CoreML, TensorRT Export #251

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gain of `EfficientNMS_TRT` embed on TensorRT #6430

Gain of `EfficientNMS_TRT` embed on TensorRT #6430

zhiqwang commented Jan 26, 2022 •

edited

Loading

github-actions bot commented Mar 1, 2022 •

edited by glenn-jocher

Loading

Gain of EfficientNMS_TRT embed on TensorRT #6430

Gain of EfficientNMS_TRT embed on TensorRT #6430

Comments

zhiqwang commented Jan 26, 2022 • edited Loading

Search before asking

Description

Use case

Known cons

Related Issue

Additional

Are you willing to submit a PR?

github-actions bot commented Mar 1, 2022 • edited by glenn-jocher Loading

Gain of `EfficientNMS_TRT` embed on TensorRT #6430

Gain of `EfficientNMS_TRT` embed on TensorRT #6430

zhiqwang commented Jan 26, 2022 •

edited

Loading

github-actions bot commented Mar 1, 2022 •

edited by glenn-jocher

Loading