Optimum Workspace Value Depending on GPU Memory During TensorRT Export #14038
-
Greetings, I might be completely mis-utilizing the hardware-software packages for now, but otherwise, 4 GB seems to be quite small to maneuver in. My second question is, without allocating swap memory from the attached NvME SSD, it would be impossible to export even a yolov8n.pt to tensorRT on the Orin itself. But when I do this, would this affect the accuracy of the resulting engine file, due to some operations being carried out outside the GPU? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Greetings! Thank you for reaching out and for your detailed questions. It's great to hear about your progress with deploying YOLOv8 on your Jetson Orin Nano 4GB. Let's address your queries one by one: Confirming Export on the Target DeviceYou are correct. Exporting the model to TensorRT should ideally be done on the device where it will run inference. This ensures that the calibration and optimization processes are tailored to the specific hardware, which can significantly improve performance. Optimum Workspace SizeRegarding the workspace size during TensorRT export, you're right that there's a balance to strike. The workspace size should be large enough to allow TensorRT to explore various optimization tactics but not so large that it causes out-of-memory (OoM) issues. For a Jetson Orin Nano 4GB, a good starting point is to set the workspace to 2 GiB. This value has been found to work well in many scenarios without causing OoM errors. Here's a Python example to illustrate this: from ultralytics import YOLO
model = YOLO("yolov8n.pt")
model.export(format="engine", workspace=2) # Export with 2 GiB workspace Guidelines for Workspace SizeWhile we don't have a comprehensive table of optimum workspace values for different YOLOv8 sizes versus Orin types/GPU sizes, here are some general guidelines:
Swap Memory and AccuracyUsing swap memory from an attached NvME SSD can indeed help in scenarios where the GPU memory is insufficient. However, it's important to note that while swap memory can prevent OoM errors, it can also slow down the export process since operations will be carried out on slower storage rather than the GPU. This should not affect the accuracy of the resulting TensorRT engine file, but it may impact the export time and efficiency. Additional ResourcesFor more detailed information on TensorRT export, you can refer to our TensorRT Export Guide. This guide provides comprehensive instructions and best practices for exporting YOLOv8 models to TensorRT. If you encounter any issues or have further questions, please don't hesitate to ask. We're here to help! Best regards and happy deploying! 🚀 |
Beta Was this translation helpful? Give feedback.
Greetings!
Thank you for reaching out and for your detailed questions. It's great to hear about your progress with deploying YOLOv8 on your Jetson Orin Nano 4GB. Let's address your queries one by one:
Confirming Export on the Target Device
You are correct. Exporting the model to TensorRT should ideally be done on the device where it will run inference. This ensures that the calibration and optimization processes are tailored to the specific hardware, which can significantly improve performance.
Optimum Workspace Size
Regarding the workspace size during TensorRT export, you're right that there's a balance to strike. The workspace size should be large enough to allow TensorRT to explore var…