Optimum Workspace Value Depending on GPU Memory During TensorRT Export #14038

u-uzun · 2024-06-27T10:22:44Z

u-uzun
Jun 27, 2024

Greetings,
I am completely new to edge deployment and I have covered quite some ground with testing a yolov8 model on my Jetson Orin Nano 4GB. I am trying to scrape off all necessary information to be able to pull this off. From Glenn's Youtube video discussion with Seeed on Jetson deployment, I understand that a model should be exported to TensorRT ONLY on the device it is going to run inference on, so in my case, I should do the exporting on my Orin Nano. Can you confirm this?
My main question is this: from what I understand, there is a trade off when selecting a workspace size during export, such that it should be large enough to try more efficient tactics, but small enough so that these tactics can be selected without running into OoM issues. For a Jetson Orin Nano 4 GB, is there an optimum workspace choice (or an interval) that the Ultralytics team might have carried out some tests on? Would it be possible to construct a table of optimum workspace values for different Yolov8 sizes versus Orin types/GPU sizes. If not, are there any guidelines or rules of thumb to assess this before going down the trial-and-error path?

I might be completely mis-utilizing the hardware-software packages for now, but otherwise, 4 GB seems to be quite small to maneuver in. My second question is, without allocating swap memory from the attached NvME SSD, it would be impossible to export even a yolov8n.pt to tensorRT on the Orin itself. But when I do this, would this affect the accuracy of the resulting engine file, due to some operations being carried out outside the GPU?
Thank you.

Answered by pderrenger

Jun 27, 2024

Greetings!

Thank you for reaching out and for your detailed questions. It's great to hear about your progress with deploying YOLOv8 on your Jetson Orin Nano 4GB. Let's address your queries one by one:

Confirming Export on the Target Device

You are correct. Exporting the model to TensorRT should ideally be done on the device where it will run inference. This ensures that the calibration and optimization processes are tailored to the specific hardware, which can significantly improve performance.

Optimum Workspace Size

Regarding the workspace size during TensorRT export, you're right that there's a balance to strike. The workspace size should be large enough to allow TensorRT to explore var…

View full answer

pderrenger · 2024-06-27T17:25:35Z

pderrenger
Jun 27, 2024
Maintainer

Greetings!

Thank you for reaching out and for your detailed questions. It's great to hear about your progress with deploying YOLOv8 on your Jetson Orin Nano 4GB. Let's address your queries one by one:

Confirming Export on the Target Device

You are correct. Exporting the model to TensorRT should ideally be done on the device where it will run inference. This ensures that the calibration and optimization processes are tailored to the specific hardware, which can significantly improve performance.

Optimum Workspace Size

Regarding the workspace size during TensorRT export, you're right that there's a balance to strike. The workspace size should be large enough to allow TensorRT to explore various optimization tactics but not so large that it causes out-of-memory (OoM) issues. For a Jetson Orin Nano 4GB, a good starting point is to set the workspace to 2 GiB. This value has been found to work well in many scenarios without causing OoM errors.

Here's a Python example to illustrate this:

from ultralytics import YOLO

model = YOLO("yolov8n.pt")
model.export(format="engine", workspace=2)  # Export with 2 GiB workspace

Guidelines for Workspace Size

While we don't have a comprehensive table of optimum workspace values for different YOLOv8 sizes versus Orin types/GPU sizes, here are some general guidelines:

YOLOv8n (Nano): Start with 2 GiB workspace.
YOLOv8s (Small): Start with 2-3 GiB workspace.
YOLOv8m (Medium): Start with 3-4 GiB workspace.
YOLOv8l (Large): Start with 4 GiB workspace and adjust as needed.

Swap Memory and Accuracy

Using swap memory from an attached NvME SSD can indeed help in scenarios where the GPU memory is insufficient. However, it's important to note that while swap memory can prevent OoM errors, it can also slow down the export process since operations will be carried out on slower storage rather than the GPU. This should not affect the accuracy of the resulting TensorRT engine file, but it may impact the export time and efficiency.

Additional Resources

For more detailed information on TensorRT export, you can refer to our TensorRT Export Guide. This guide provides comprehensive instructions and best practices for exporting YOLOv8 models to TensorRT.

If you encounter any issues or have further questions, please don't hesitate to ask. We're here to help!

Best regards and happy deploying! 🚀

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ultralytics

Optimum Workspace Value Depending on GPU Memory During TensorRT Export #14038

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Ultralytics

Optimum Workspace Value Depending on GPU Memory During TensorRT Export #14038

u-uzun Jun 27, 2024

Confirming Export on the Target Device

Optimum Workspace Size

Replies: 1 comment

pderrenger Jun 27, 2024 Maintainer

Confirming Export on the Target Device

Optimum Workspace Size

Guidelines for Workspace Size

Swap Memory and Accuracy

Additional Resources

u-uzun
Jun 27, 2024

pderrenger
Jun 27, 2024
Maintainer