In this section, we are trying to deploy light-weight, pruned or quantized YOLOv5 via different inference framework on different devices.
pip install -U nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com
python export_onnx_trt.py --weights yolov5s.pt --device 0
Now get yolov5s.engine.
python export_onnx_trt.py --weights yolov5s.engine --device 0
Check Tencent/ncnn for help.
python export_onnx_trt.py --weights yolov5s.pt --device 0 --train --simplify
Now get yolov5s.onnx. Then use NCNN's onnx2ncnn tool to convert *.onnx to *.param and *.bin.
Navigate to ncnn/build/tools/onnx, run
./onnx2ncnn yolov5s.onnx yolov5s.param yolov5s.bin
You can also use NCNN's ncnnoptimize tool to reduce model size.
We provide yolov5_ncnn.cpp for detection and timekeeping. Build it with NCNN, and run
./yolov5 test.jpg yolov5s
Backend | Model | File Size | latency(ms per img) |
---|---|---|---|
TensorRT | YOLOv5s | 17M | 2.3 |
TensorRT | YOLOv5s-EagleEye@0.6 | 11M | 2.0 |
TensorRT | YOLOv5l-MobileNetv3Small | 44M | 2.9 |
TensorRT | YOLOv5l-EfficientNetLite0 | 47M | 3.0 |
ncnn(Vulkan) | YOLOv5s | 14M | 235 |
ncnn(Vulkan) | YOLOv5s-EagleEye@0.6 | 7.5M | 215 |
Input size is 640x640.
Backend | Model | File Size | latency(ms per img) |
---|---|---|---|
ncnn(Vulkan) | YOLOv5s | 14M | 520 |
ncnn(Vulkan) | YOLOv5s-EagleEye@0.6 | 7.5M | 610 |
Input size is 640x640.