Skip to content

YOLOv5 TensorRT

Aditya Lohia edited this page Jun 8, 2022 · 1 revision

This guide explains how to run YOLOv5 with TensorRT backend.


(Only supported for NVIDIA-GPUs, Tested on Linux Devices, Partial Dynamic Support)

You can use TensorRT powered detector by specifying the backend parameter.

from cvu.detector import Detector
detector = Detector(classes="coco", backend = "tensorrt")

Internally, the Detector will build TensorRT Cuda-Engine using pretrained ONNX Yolov5s weight file.

If you want to run the detector for your custom weights, simply do the following:

Make sure you use the ---dynamic flag while exporting your custom weights.

python export.py --weights $PATH_TO_PYTORCH_WEIGHTS --dynamic --include onnx

Now simply set parameter weight="path_to_custom_weights.onnx" in Detector initialization, and you're ready for inference.

INT8 Qunatization

Besides FP16, TensorRT supports the use of 8-bit integers to represent quantized floating point values. To enable int8 precision while building TensorRT engines pass in the dtype="int8" which is by default "fp16".

Similar to test/validation datasets INT8 calibration requires a set of input images as calibration dataset. Make sure the calibration files are representative of the overall inference data files.

For the INT8 calibration of YOLOv5 pretrained on COCO please use this COCO calibration dataset.

from cvu.detector import Detector

detector = Detector(
    classes="coco", 
    backend = "tensorrt", 
    dtype="int8", 
    calib_images_dir="./coco_calib/"
)

Notes
  • Unlike other backends, TensorRT backend is not fully dynamic (for optimization reasons). You can initiate Detector and inference on any shape of image and it'll setup engine's input shape to the first input's shape. To run inference on a different shaped image, you'll have to create new detector.

  • Building TensorRT Engine and first inference can take sometime to complete (specially if it also has to install all the dependecies for the first time).

  • A new engine is built for an unseen input shape. But once built, engine file is cached and used for future inference.

Clone this wiki locally