Dynamic batch size support for TensorRT #8526

democat3457 · 2022-07-08T22:02:14Z

TensorRT supports dynamic input shapes by setting an optimization profile and setting the binding input size on-the-fly. This PR exposes that option when exporting as TensorRT and resizes the binding input size during detection.

Tested with exporting both dynamic and non-dynamic TensorRT models and running them through DetectMultiBackend.

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Improved TensorRT model export support with dynamic axes and better input handling.

📊 Key Changes

💡 Introduced support for dynamic axes in TensorRT exports, allowing variable input sizes.
🛠 Refactored export_engine function to accept a dynamic flag.
📏 Adjusted shape setting for bindings in TensorRT to support dynamic input shapes.
✂️ Removed verbose parameter in export_engine, now using internal prefix variable.
🔧 Adjusted warning message to account for dynamic models' batch-size requirements.
✨ Enhanced TensorRT execution context to correctly handle dynamic shapes during inference.

🎯 Purpose & Impact

📈 The export process for TensorRT models can now better handle various input sizes, making it more flexible.
🔄 Users can benefit from dynamically shaped inputs without needing to re-export models for different sizes.
🛒 This feature is valuable for applications that require processing images or data in batches of different sizes.
⚙️ Downstream, the dynamic shape support can lead to more efficient resource usage as only one model needs to be maintained for varying input sizes.

for more information, see https://pre-commit.ci

export.py

glenn-jocher · 2022-07-11T14:26:22Z

@democat3457 thanks for the PR! This is very interesting. I'll try to test independently today. Two questions:

What is the impact on export time for normal TRT vs dynamic TRT models?
What is the impact on inference speed for normal TRT vs dynamic TRT models?

democat3457 · 2022-07-11T17:41:06Z

@glenn-jocher
NOTE: times may be skewed a little because my GPU is currently busy with training.
Running with a --batch-size 32 argument

66.05 - 86.21 seconds for a normal export, 76.58 - 85.47 seconds for a dynamic export
- To inference on normal TRT models, I pad the rest of the batch with np.zeros to fill the batch up to the batch size (dynamic does not need this)
- There are 9 images with a non-padded batch size of 12 and 7 images with a non-padded batch size of 28
a. 19 ms inference on normal TRT model (batch size of 32)
b. 9 ms inference on dynamic TRT model (batch size of 12)
c. 18 ms inference on dynamic TRT model (batch size of 28)

democat3457 · 2022-07-14T20:02:29Z

This is very interesting. I'll try to test independently today.

@glenn-jocher have you been able to test this?

for more information, see https://pre-commit.ci

glenn-jocher · 2022-07-15T14:24:17Z

@democat3457 I tested this PR in colab but got an error. Could you take a look please?

!git clone https://github.com/democat3457/yolov5 -b patch-1  # clone
%cd yolov5
%pip install -qr requirements.txt  # install
%pip install -U nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com  # install TRT
!python export.py --weights yolov5s.pt --include engine --imgsz 640 --device 0 --dynamic  # export

democat3457 · 2022-07-15T18:19:24Z

I tested this PR in colab but got an error. Could you take a look please?

The issue had to do with some int division rounding down to 0, should be fixed now

glenn-jocher · 2022-07-16T13:39:52Z

@democat3457 thanks! I'll retest

democat3457 · 2022-07-26T22:57:00Z

@glenn-jocher have you been able to retest?

glenn-jocher · 2022-07-27T14:01:23Z

@democat3457 thanks for the reminder, testing now!

glenn-jocher · 2022-07-27T14:17:55Z

@democat3457 @democat3457 PR fails on batch-size 2 inference:

To reproduce:

!git clone https://github.com/democat3457/yolov5 -b patch-1  # clone
%cd yolov5
%pip install -qr requirements.txt  # install
%pip install -U nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com  # install TRT
!python export.py --weights yolov5s.pt --include engine --imgsz 640 --device 0 --dynamic  # export


# PyTorch Hub
import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5s.engine')

# Images
dir = 'https://ultralytics.com/images/'
imgs = [dir + f for f in ('zidane.jpg', 'bus.jpg')]  # batch of images

# Inference
results = model(imgs)
results.print()  # or .show(), .save()

democat3457 · 2022-07-27T17:19:18Z

PR fails on batch-size 2 inference:

@glenn-jocher this is because you exported with a (default) max batch size of 1, but tried to use a batch size of 2 when inferencing. TensorRT requires a maximum batch size to properly do dynamic batches, so the --batch-size argument is required to tell TensorRT what the max batch size is.

for more information, see https://pre-commit.ci

democat3457 · 2022-07-27T17:27:20Z

A warning is now displayed if batch-size <= 1 when the dynamic flag is enabled to tell the user to specify a maximum batch size.

for more information, see https://pre-commit.ci

glenn-jocher · 2022-07-27T18:24:30Z

Ah got it! I'll try again following your updates above.

glenn-jocher · 2022-07-27T19:26:02Z

@democat3457 I retested with --batch-size 16 during export and two images batched during inference but I get a new error now in Colab:
https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb?hl=en

!git clone https://github.com/democat3457/yolov5 -b patch-1  # clone
%cd yolov5
%pip install -qr requirements.txt  # install
%pip install -U nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com  # install TRT
!python export.py --weights yolov5s.pt --include engine --imgsz 640 --device 0 --dynamic --batch-size 16 # export


# PyTorch Hub
import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5s.engine')

# Images
dir = 'https://ultralytics.com/images/'
imgs = [dir + f for f in ('zidane.jpg', 'bus.jpg')]  # batch of images

# Inference
results = model(imgs)
results.print()  # or .show(), .save()

Error on PyTorch Hub inference:

YOLOv5 🚀 2022-7-27 Python-3.7.13 torch-1.12.0+cu113 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)

Loading yolov5s.engine for TensorRT inference...
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[/root/.cache/torch/hub/ultralytics_yolov5_master/hubconf.py](https://localhost:8080/#) in _create(name, pretrained, channels, classes, autoshape, verbose, device)
     45         if pretrained and channels == 3 and classes == 80:
---> 46             model = DetectMultiBackend(path, device=device, fuse=autoshape)  # download/load FP32 model
     47             # model = models.experimental.attempt_load(path, map_location=device)  # download/load FP32 model

5 frames
ValueError: negative dimensions are not allowed

The above exception was the direct cause of the following exception:

Exception                                 Traceback (most recent call last)
[/root/.cache/torch/hub/ultralytics_yolov5_master/hubconf.py](https://localhost:8080/#) in _create(name, pretrained, channels, classes, autoshape, verbose, device)
     65         help_url = 'https://docs.ultralytics.com/yolov5/tutorials/pytorch_hub_model_loading'
     66         s = f'{e}. Cache may be out of date, try `force_reload=True` or see {help_url} for help.'
---> 67         raise Exception(s) from e
     68 
     69 

Exception: negative dimensions are not allowed. Cache may be out of date, try `force_reload=True` or see https://docs.ultralytics.com/yolov5/tutorials/pytorch_hub_model_loading for help.

democat3457 · 2022-07-27T19:59:49Z

@glenn-jocher the issue is with the loaded repo when you use torch.hub.load. You load the default ultralytics/yolov5, which doesn't have the updated DetectMultiBackend.

I fixed that line to

model = torch.hub.load('/content/yolov5', 'custom', 'yolov5s.engine', source='local')

After reloading the runtime and re-running the script, I get this:

YOLOv5 🚀 v6.1-346-g352d45a Python-3.7.13 torch-1.12.0+cu113 CUDA:0 (Tesla T4, 15110MiB)

Loading yolov5s.engine for TensorRT inference...
Adding AutoShape... 

image 1/16: 720x1280 2 class0s, 2 class27s
image 2/16: 1080x810 4 class0s, 1 class5
Speed: 25.0ms pre-process, 44.3ms inference, 1.4ms NMS per image at shape (2, 3, 640, 640)

glenn-jocher · 2022-07-29T11:26:10Z

@democat3457 oh of course, beginner mistake I made. Thanks for reviewing.

glenn-jocher · 2022-07-29T11:39:04Z

This works now:

!git clone https://github.com/democat3457/yolov5 -b patch-1  # clone
%cd yolov5
%pip install -qr requirements.txt  # install
%pip install -U nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com  # install TRT
!python export.py --weights yolov5s.pt --include engine --imgsz 640 --device 0 --dynamic  # export


# PyTorch Hub
import torch

# Model
model = torch.hub.load('democat3457/yolov5:patch-1', 'custom', 'yolov5s.engine')

# Images
dir = 'https://ultralytics.com/images/'
imgs = [dir + f for f in ('zidane.jpg', 'bus.jpg')]  # batch of images

# Inference
results = model(imgs)
results.print()  # or .show(), .save()

glenn-jocher · 2022-07-29T11:51:26Z

@democat3457 PR is merged. Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐

* Dynamic batch size support for TensorRT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update export.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix optimization profile when batch size is 1 * Warn users if they use batch-size=1 with dynamic * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * More descriptive assertion error * Fix syntax * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * pre-commit formatting sucked * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update export.py Co-authored-by: Colin Wong <noreply@brains4drones.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

YoungjaeDev · 2022-12-23T13:00:25Z

@democat3457
Is it possible to explain the details of the part?

9 images with a non-padded batch size of 12 and 7 images with a non-padded batch size of 28 , I'm confused with the meaning
(below mentioned) Is the measured time for the dynamic model? Then where can I see the measurement time for non-dynamic?

To inference on normal TRT models, I pad the rest of the batch with np.zeros to fill the batch up to the batch size (dynamic does not need this)

There are 9 images with a non-padded batch size of 12 and 7 images with a non-padded batch size of 28
a. 19 ms inference on normal TRT model (batch size of 32)
b. 9 ms inference on dynamic TRT model (batch size of 12)
c. 18 ms inference on dynamic TRT model (batch size of 28)

democat3457 · 2022-12-24T08:39:04Z

@youngjae-avikus sure

Both the normal and dynamic models were created with a batch size of 32 (with the dynamic model technically being a max batch size of 32). When I say image, I technically just mean one batch - so in my testing, I ran 9 batches with a batch size of 12, and 7 batches with a batch size of 28. Also, the "normal" model is the non-dynamic one.
With the dynamic model, it can run any batch at a batch size less than or equal to the max batch size, so it can run all of the tests without modifying the model input. However, with the normal model, it requires a fixed batch size of 32 as its input, so the various batches with batch sizes 12 and 28 must be padded with extra images to fill them up to the batch size of 32, in this case just empty images created with np.zeros.
Because the input batch size is constant for the normal model, there is only one measurement for it, and that's 19ms inference on average for the fixed batch size of 32. For the dynamic model, there's two listed measurements, one for each batch size, so a batch size of 12 is about 9ms inference on average and a batch size of 28 is about 18ms inference on average.

YoungjaeDev · 2022-12-26T03:58:13Z

@democat3457
in conclusion
a. 19 ms inference on normal TRT model (batch size of 32) -> constant 32 images
b. 9 ms inference on dynamic TRT model (batch size of 12) -> 9 images
c. 18 ms inference on dynamic TRT model (batch size of 28) -> 7 images

Then, what can be said about the reason for conducting the experimental procedure as above and what does the result suggest?

democat3457 · 2022-12-26T06:39:09Z

A correction to your point a: the normal model was ran with the 16 (9+7) total images, not 32 images (but all of those images did have their batches padded with zeros).

This experimental procedure shows that dynamic models do seem to allow proportionality of process time and batch size without sacrificing any base performance.

YoungjaeDev · 2022-12-26T10:15:20Z

@democat3457
Oh, so in summary, test a is a combination of 16 images and 16 pad images
In the end, batch size and process time are proportional regardless of the number of images, whether dynamic or constant model!

democat3457 · 2022-12-27T00:57:20Z

Not regardless of dynamic or normal model, no.

Process time is proportional to batch size only if using dynamic, but not if using normal model.

YoungjaeDev · 2022-12-27T01:07:38Z

@democat3457

Ah, I said that the batch_size 32 is slower than the batch_size 16 in the case of the normal model. Of course, the size of the input that we have to deal with (process time) has grown
dynamic model: batch size and process time are proportional regardless of the number of images, right?

democat3457 · 2022-12-27T01:12:18Z

I never tested a batch size 16 for any model though
correct

glenn-jocher · 2023-11-15T08:09:01Z

@democat3457 if you would like to further discuss testing methodologies or performance characteristics, please feel free to do so.

democat3457 · 2023-11-16T10:53:25Z

Are there any additional benchmarks in particular that you want me to discuss? (I was under the impression everything in this thread was fairly sufficient.)

glenn-jocher · 2023-11-16T15:09:16Z

@democat3457 it looks like all the necessary benchmarks have been covered in this thread. If you have any other questions or need further assistance with anything else, please feel free to ask!

Colin Wong and others added 2 commits July 8, 2022 15:55

Dynamic batch size support for TensorRT

d0238a3

[pre-commit.ci] auto fixes from pre-commit.com hooks

4c03ec2

for more information, see https://pre-commit.ci

democat3457 commented Jul 8, 2022

View reviewed changes

export.py Outdated Show resolved Hide resolved

Merge branch 'master' into patch-1

5631f4c

democat3457 added 2 commits July 12, 2022 16:53

Merge branch 'master' into patch-1

dc26f06

Merge branch 'master' into patch-1

18ec9a9

glenn-jocher linked an issue Jul 15, 2022 that may be closed by this pull request

how can I load dynamic Tensorrt model? #8571

Closed

1 task

glenn-jocher mentioned this pull request Jul 15, 2022

how can I load dynamic Tensorrt model? #8571

Closed

1 task

glenn-jocher added the TODO label Jul 15, 2022

glenn-jocher assigned democat3457 Jul 15, 2022

glenn-jocher and others added 3 commits July 15, 2022 16:01

Merge branch 'master' into patch-1

10fdc3a

Update export.py

9facb54

[pre-commit.ci] auto fixes from pre-commit.com hooks

42b2b60

for more information, see https://pre-commit.ci

Fix optimization profile when batch size is 1

6ac75d6

democat3457 and others added 3 commits July 19, 2022 16:33

Merge branch 'master' into patch-1

e6314be

Merge branch 'master' into patch-1

6a7c398

Merge branch 'master' into patch-1

d7411ad

democat3457 and others added 2 commits July 27, 2022 12:25

Warn users if they use batch-size=1 with dynamic

f9cad51

[pre-commit.ci] auto fixes from pre-commit.com hooks

cbdc898

for more information, see https://pre-commit.ci

pre-commit-ci bot and others added 4 commits July 27, 2022 17:38

[pre-commit.ci] auto fixes from pre-commit.com hooks

8be2d36

for more information, see https://pre-commit.ci

pre-commit formatting sucked

c709d49

[pre-commit.ci] auto fixes from pre-commit.com hooks

52d0cfd

for more information, see https://pre-commit.ci

Merge branch 'master' into patch-1

52c9ace

Update export.py

352d45a

This was referenced Jul 28, 2022

Support dynamic batch for TensorRT and onnxruntime WongKinYiu/yolov7#329

Merged

New way for register nms in onnx for tensorrt onnxruntime openvino #8101

Closed

glenn-jocher merged commit 587a3a3 into ultralytics:master Jul 29, 2022

glenn-jocher removed the TODO label Jul 29, 2022

democat3457 deleted the patch-1 branch July 29, 2022 16:18

glenn-jocher mentioned this pull request Aug 4, 2022

is there some error with dynamic export for TensorRT? #8866

Closed

2 tasks

Hojland mentioned this pull request Oct 17, 2022

feat/bump Go-Autonomous/yolov5#15

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic batch size support for TensorRT #8526

Dynamic batch size support for TensorRT #8526

democat3457 commented Jul 8, 2022 •

edited by UltralyticsAssistant

Loading

glenn-jocher commented Jul 11, 2022

democat3457 commented Jul 11, 2022

democat3457 commented Jul 14, 2022

glenn-jocher commented Jul 15, 2022

democat3457 commented Jul 15, 2022

glenn-jocher commented Jul 16, 2022

democat3457 commented Jul 26, 2022

glenn-jocher commented Jul 27, 2022

glenn-jocher commented Jul 27, 2022

democat3457 commented Jul 27, 2022

democat3457 commented Jul 27, 2022

glenn-jocher commented Jul 27, 2022

glenn-jocher commented Jul 27, 2022 •

edited

Loading

democat3457 commented Jul 27, 2022

glenn-jocher commented Jul 29, 2022

glenn-jocher commented Jul 29, 2022

glenn-jocher commented Jul 29, 2022

YoungjaeDev commented Dec 23, 2022

democat3457 commented Dec 24, 2022

YoungjaeDev commented Dec 26, 2022

democat3457 commented Dec 26, 2022

YoungjaeDev commented Dec 26, 2022 •

edited

Loading

democat3457 commented Dec 27, 2022

YoungjaeDev commented Dec 27, 2022

democat3457 commented Dec 27, 2022 •

edited

Loading

glenn-jocher commented Nov 15, 2023

democat3457 commented Nov 16, 2023

glenn-jocher commented Nov 16, 2023

Dynamic batch size support for TensorRT #8526

Dynamic batch size support for TensorRT #8526

Conversation

democat3457 commented Jul 8, 2022 • edited by UltralyticsAssistant Loading

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

glenn-jocher commented Jul 11, 2022

democat3457 commented Jul 11, 2022

democat3457 commented Jul 14, 2022

glenn-jocher commented Jul 15, 2022

democat3457 commented Jul 15, 2022

glenn-jocher commented Jul 16, 2022

democat3457 commented Jul 26, 2022

glenn-jocher commented Jul 27, 2022

glenn-jocher commented Jul 27, 2022

democat3457 commented Jul 27, 2022

democat3457 commented Jul 27, 2022

glenn-jocher commented Jul 27, 2022

glenn-jocher commented Jul 27, 2022 • edited Loading

democat3457 commented Jul 27, 2022

glenn-jocher commented Jul 29, 2022

glenn-jocher commented Jul 29, 2022

glenn-jocher commented Jul 29, 2022

YoungjaeDev commented Dec 23, 2022

democat3457 commented Dec 24, 2022

YoungjaeDev commented Dec 26, 2022

democat3457 commented Dec 26, 2022

YoungjaeDev commented Dec 26, 2022 • edited Loading

democat3457 commented Dec 27, 2022

YoungjaeDev commented Dec 27, 2022

democat3457 commented Dec 27, 2022 • edited Loading

glenn-jocher commented Nov 15, 2023

democat3457 commented Nov 16, 2023

glenn-jocher commented Nov 16, 2023

democat3457 commented Jul 8, 2022 •

edited by UltralyticsAssistant

Loading

glenn-jocher commented Jul 27, 2022 •

edited

Loading

YoungjaeDev commented Dec 26, 2022 •

edited

Loading

democat3457 commented Dec 27, 2022 •

edited

Loading