Add trtexec TensorRT export #6984

triple-Mu · 2022-03-15T09:28:22Z

I tried to add the trtexec tensorrt export and got very interesting results as follows.
1:Use the original export method
The mAP results:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.374
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.570
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.401
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.216
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.423
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.489
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.311
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.516
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.566
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.377
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.627
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.718
The FPS results:
Class Images Labels P R mAP@.5 mAP@.5:.95: 100%|██████████| 5000/5000 [00:37<00:00, 134.16it/s]
all 5000 36335 0.661 0.524 0.615 0.439
Speed: 0.2ms pre-process, 1.5ms inference, 0.5ms NMS per image at shape (1, 3, 640, 640)

2:Use the trtexec export method
The mAP results:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.374
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.571
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.401
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.216
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.423
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.489
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.311
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.516
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.566
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.378
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.628
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.718
The FPS results:
Class Images Labels P R mAP@.5 mAP@.5:.95: 100%|██████████| 5000/5000 [00:46<00:00, 108.09it/s]
all 5000 36335 0.659 0.525 0.616 0.44
Speed: 0.2ms pre-process, 3.5ms inference, 0.5ms NMS per image at shape (1, 3, 640, 640)

3:Summarize
Maybe the trtexec export method get good AP@0.5 and mAR(small/medium)
But it increases inference time from 1.5ms to 3.5ms.
All result images and logs are shown below.

So it will help us to get more accurate results if we use trtexec.
Thanks!

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

WARNING ⚠️ this PR is very large, summary may not cover all changes.

🌟 Summary

This PR introduced enhancements to TensorRT export functionality in YOLOv5.

📊 Key Changes

Added support for onnx-graphsurgeon to optimize ONNX models
Added 'dynamic_axes' parameter to allow for dynamic input sizes during ONNX export
Reduced model export size and memory consumption
Improved model inference times and GPU utilization

🎯 Purpose & Impact

Enhanced Performance: Users can expect faster model inference with less memory overhead, making deployment on diverse platforms more efficient.
Dynamic Input Handling: The ability to handle dynamic input sizes provides flexibility for various use cases and input data.
Optimized Model Size: Reduced model export size facilitates easier deployment, especially in edge computing scenarios where resources are limited.

github-actions

👋 Hello @triple-Mu, thank you for submitting a YOLOv5 🚀 PR! To allow your work to be integrated as seamlessly as possible, we advise you to:

✅ Verify your PR is up-to-date with upstream/master. If your PR is behind upstream/master an automatic GitHub Actions merge may be attempted by writing /rebase in a new comment, or by running the following code, replacing 'feature' with the name of your local branch:

git remote add upstream https://github.com/ultralytics/yolov5.git
git fetch upstream
# git checkout feature  # <--- replace 'feature' with local branch name
git merge upstream/master
git push -u origin -f

✅ Verify all Continuous Integration (CI) checks are passing.
✅ Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." -Bruce Lee

glenn-jocher · 2022-03-15T10:47:55Z

@triple-Mu thanks for the PR! Was not familiar with trtexec. What's the main difference with the default tensorrt export?

BTW note that the default TRT export will always be in FP16 mode regardless of --half. We use this by default as we did not observe any mAP drops but did observe significant speedup in --half mode. Full benchmarking results are in #6963

Colab++ V100 High-RAM Results

benchmarks: weights=/content/yolov5/yolov5s.pt, imgsz=640, batch_size=1, data=/content/yolov5/data/coco128.yaml, device=0, half=False
Checking setup...
YOLOv5 🚀 v6.1-48-g0c1025f torch 1.10.0+cu111 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)
Setup complete ✅ (8 CPUs, 51.0 GB RAM, 46.1/166.8 GB disk)

Benchmarks complete (433.63s)
                   Format  mAP@0.5:0.95  Inference time (ms)
0                 PyTorch      0.462296             9.159939
1             TorchScript      0.462296             6.607546
2                    ONNX      0.462296            12.698026
3                OpenVINO           NaN                  NaN
4                TensorRT      0.462280             1.725197
5                  CoreML           NaN                  NaN
6   TensorFlow SavedModel      0.462296            20.273019
7     TensorFlow GraphDef      0.462296            20.212173
8         TensorFlow Lite           NaN                  NaN
9     TensorFlow Edge TPU           NaN                  NaN
10          TensorFlow.js           NaN                  NaN

triple-Mu · 2022-03-15T11:27:41Z

@triple-Mu thanks for the PR! Was not familiar with trtexec. What's the main difference with the default tensorrt export?

BTW note that the default TRT export will always be in FP16 mode regardless of --half. We use this by default as we did not observe any mAP drops but did observe significant speedup in --half mode. Full benchmarking results are in #6963

Colab++ V100 High-RAM Results

benchmarks: weights=/content/yolov5/yolov5s.pt, imgsz=640, batch_size=1, data=/content/yolov5/data/coco128.yaml, device=0, half=False
Checking setup...
YOLOv5 🚀 v6.1-48-g0c1025f torch 1.10.0+cu111 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)
Setup complete ✅ (8 CPUs, 51.0 GB RAM, 46.1/166.8 GB disk)

Benchmarks complete (433.63s)
                   Format  mAP@0.5:0.95  Inference time (ms)
0                 PyTorch      0.462296             9.159939
1             TorchScript      0.462296             6.607546
2                    ONNX      0.462296            12.698026
3                OpenVINO           NaN                  NaN
4                TensorRT      0.462280             1.725197
5                  CoreML           NaN                  NaN
6   TensorFlow SavedModel      0.462296            20.273019
7     TensorFlow GraphDef      0.462296            20.212173
8         TensorFlow Lite           NaN                  NaN
9     TensorFlow Edge TPU           NaN                  NaN
10          TensorFlow.js           NaN                  NaN

@glenn-jocher Thank you for your reply. trtexec has some optimizations for the machine gpu to export the engine, so the export time may be longer. At the same time, we can view the detailed information in the export process, such as the inference time of random inputs and the time consumption between various layers of the network, which is very convenient.
Besides, trtexec is installed with tensorrt by default, which can avoid installing the python version of tensorrt whl. and more convenient to use in nvidia jetson nano/TX2/AGX

glenn-jocher · 2022-03-15T11:42:20Z

@triple-Mu got it, thanks! TRTexec export actually seems faster in your results, i.e. 180 seconds instead of 380 seconds. I tried to run PR but get this error:

/bin/sh: 1: /usr/src/tensorrt/bin/trtexec: not found

Existing pip install does not appear to install trtexec:

pip install -U nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com

triple-Mu · 2022-03-15T11:44:30Z

@glenn-jocher
Where is your tensorrt installation path?
If it is installed using deb, it will be in this path by default, otherwise you need to modify ‘/usr/src/tensorrt/bin/trtexec’ to TensorRT-8.2.3.1/bin/trtexec

glenn-jocher · 2022-03-15T11:49:35Z

@triple-Mu this is the full code I'm using to clone the PR, install requirements and run export. I'm running this in Colab:
https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb?hl=en

!git clone https://github.com/triple-Mu/yolov5 -b tripleMu # clone
%cd yolov5
%pip install -qr requirements.txt  # install
%pip install -U nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com  # install
!python export.py --weights yolov5s.pt --include engine --device 0 --trtexec

triple-Mu · 2022-03-15T11:51:31Z

@glenn-jocher

I tried fp16 export and got a weird result. The AP or AR will be different.But the inference time is the same as original method.

glenn-jocher · 2022-03-15T12:04:11Z

@triple-Mu this I'm using this code to clone the PR, install requirements and run export. I'm running this in Colab:
https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb?hl=en

!git clone https://github.com/triple-Mu/yolov5 -b tripleMu # clone
%cd yolov5
%pip install -qr requirements.txt  # install
%pip install -U nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com  # install
!python export.py --weights yolov5s.pt --include engine --device 0 --trtexec

But there's no trtexec file that I can find:

zhiqwang · 2022-03-15T12:05:19Z

Just FYI @glenn-jocher , trtexec is a command line wrapper tools for

It’s useful for benchmarking networks on random or user-provided input data.
It’s useful for generating serialized engines from models.
It’s useful for generating serialized timing cache from the builder.

And seems that trtexec would be difficult to obtain if we use pip to install TensorRT.

triple-Mu · 2022-03-15T12:13:10Z

@glenn-jocher
I don't know how you installed TensorRT, do you know the path where you store TensorRT? This command line tool should be in the bin folder under the TensorRT path.
Refer to https://github.com/NVIDIA/TensorRT/tree/main/samples/trtexec

zhiqwang · 2022-03-15T12:14:58Z

Actually the trtexec plays the same role as the Python wrapper. I think it would be better to put the usage of trtexec in the docs and we can add some instructions to inform users that they can also use trtexec to generate serialized engines than to repackage trtexec back into python.

glenn-jocher · 2022-05-13T12:37:40Z

@triple-Mu quick questions. Is this PR compatible with your new PR #7736 or does the new PR replace this one?

triple-Mu · 2022-05-14T05:08:55Z

@triple-Mu quick questions. Is this PR compatible with your new PR #7736 or does the new PR replace this one?

New pr has nothing to do with the old one. All right,just as you wish is ok.It does not matter for me.

github-actions bot reviewed Mar 15, 2022

View reviewed changes

glenn-jocher changed the title ~~Add trtexec tensorrt export~~ Add trtexec TensorRT export Mar 15, 2022

Song Lin added 4 commits May 9, 2022 15:28

Add TensorRT EfficientNMS plugin regiseter

73c6f23

Fix fp16 tensorrt export

40f46cb

Add a test notebook for showing result

ad2dfe0

Fix fp16 log info print

fd22bfa

triple-Mu force-pushed the tripleMu branch from ac60900 to fd22bfa Compare May 9, 2022 14:48

Merge branch 'master' into tripleMu

2fb62af

glenn-jocher added 2 commits May 16, 2022 17:03

Merge branch 'master' into tripleMu

6ec4eac

Merge branch 'master' into tripleMu

6d192b3

glenn-jocher mentioned this pull request May 19, 2022

Add nms for tensorrt8.0+ / onnxruntime / openvino(the same way as onnxruntime) #7736

Closed

Merge branch 'master' into tripleMu

6e06808

glenn-jocher added the TODO label May 19, 2022

triple-Mu closed this May 19, 2022

glenn-jocher removed the TODO label May 19, 2022

triple-Mu deleted the tripleMu branch May 20, 2022 13:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add trtexec TensorRT export #6984

Add trtexec TensorRT export #6984

triple-Mu commented Mar 15, 2022 •

edited by UltralyticsAssistant

Loading

github-actions bot left a comment

glenn-jocher commented Mar 15, 2022

triple-Mu commented Mar 15, 2022 •

edited

Loading

Colab++ V100 High-RAM Results

glenn-jocher commented Mar 15, 2022 •

edited

Loading

triple-Mu commented Mar 15, 2022 •

edited

Loading

glenn-jocher commented Mar 15, 2022 •

edited

Loading

triple-Mu commented Mar 15, 2022

glenn-jocher commented Mar 15, 2022 •

edited

Loading

zhiqwang commented Mar 15, 2022

triple-Mu commented Mar 15, 2022

zhiqwang commented Mar 15, 2022

glenn-jocher commented May 13, 2022

triple-Mu commented May 14, 2022

Add trtexec TensorRT export #6984

Add trtexec TensorRT export #6984

Conversation

triple-Mu commented Mar 15, 2022 • edited by UltralyticsAssistant Loading

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

github-actions bot left a comment

Choose a reason for hiding this comment

glenn-jocher commented Mar 15, 2022

Colab++ V100 High-RAM Results

triple-Mu commented Mar 15, 2022 • edited Loading

Colab++ V100 High-RAM Results

glenn-jocher commented Mar 15, 2022 • edited Loading

triple-Mu commented Mar 15, 2022 • edited Loading

glenn-jocher commented Mar 15, 2022 • edited Loading

triple-Mu commented Mar 15, 2022

glenn-jocher commented Mar 15, 2022 • edited Loading

zhiqwang commented Mar 15, 2022

triple-Mu commented Mar 15, 2022

zhiqwang commented Mar 15, 2022

glenn-jocher commented May 13, 2022

triple-Mu commented May 14, 2022

triple-Mu commented Mar 15, 2022 •

edited by UltralyticsAssistant

Loading

triple-Mu commented Mar 15, 2022 •

edited

Loading

glenn-jocher commented Mar 15, 2022 •

edited

Loading

triple-Mu commented Mar 15, 2022 •

edited

Loading

glenn-jocher commented Mar 15, 2022 •

edited

Loading

glenn-jocher commented Mar 15, 2022 •

edited

Loading