Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic batch size support for TensorRT #8526

Merged
merged 21 commits into from
Jul 29, 2022

Conversation

democat3457
Copy link
Contributor

@democat3457 democat3457 commented Jul 8, 2022

TensorRT supports dynamic input shapes by setting an optimization profile and setting the binding input size on-the-fly. This PR exposes that option when exporting as TensorRT and resizes the binding input size during detection.

Tested with exporting both dynamic and non-dynamic TensorRT models and running them through DetectMultiBackend.

πŸ› οΈ PR Summary

Made with ❀️ by Ultralytics Actions

🌟 Summary

Improved TensorRT model export support with dynamic axes and better input handling.

πŸ“Š Key Changes

  • πŸ’‘ Introduced support for dynamic axes in TensorRT exports, allowing variable input sizes.
  • πŸ›  Refactored export_engine function to accept a dynamic flag.
  • πŸ“ Adjusted shape setting for bindings in TensorRT to support dynamic input shapes.
  • βœ‚οΈ Removed verbose parameter in export_engine, now using internal prefix variable.
  • πŸ”§ Adjusted warning message to account for dynamic models' batch-size requirements.
  • ✨ Enhanced TensorRT execution context to correctly handle dynamic shapes during inference.

🎯 Purpose & Impact

  • πŸ“ˆ The export process for TensorRT models can now better handle various input sizes, making it more flexible.
  • πŸ”„ Users can benefit from dynamically shaped inputs without needing to re-export models for different sizes.
  • πŸ›’ This feature is valuable for applications that require processing images or data in batches of different sizes.
  • βš™οΈ Downstream, the dynamic shape support can lead to more efficient resource usage as only one model needs to be maintained for varying input sizes.

export.py Outdated Show resolved Hide resolved
@glenn-jocher
Copy link
Member

@democat3457 thanks for the PR! This is very interesting. I'll try to test independently today. Two questions:

  1. What is the impact on export time for normal TRT vs dynamic TRT models?
  2. What is the impact on inference speed for normal TRT vs dynamic TRT models?

@democat3457
Copy link
Contributor Author

@glenn-jocher
NOTE: times may be skewed a little because my GPU is currently busy with training.
Running with a --batch-size 32 argument

  1. 66.05 - 86.21 seconds for a normal export, 76.58 - 85.47 seconds for a dynamic export
  2. - To inference on normal TRT models, I pad the rest of the batch with np.zeros to fill the batch up to the batch size (dynamic does not need this)
    - There are 9 images with a non-padded batch size of 12 and 7 images with a non-padded batch size of 28
    a. 19 ms inference on normal TRT model (batch size of 32)
    b. 9 ms inference on dynamic TRT model (batch size of 12)
    c. 18 ms inference on dynamic TRT model (batch size of 28)

@democat3457
Copy link
Contributor Author

This is very interesting. I'll try to test independently today.

@glenn-jocher have you been able to test this?

@glenn-jocher glenn-jocher linked an issue Jul 15, 2022 that may be closed by this pull request
1 task
@glenn-jocher
Copy link
Member

@democat3457 I tested this PR in colab but got an error. Could you take a look please?

!git clone https://github.com/democat3457/yolov5 -b patch-1  # clone
%cd yolov5
%pip install -qr requirements.txt  # install
%pip install -U nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com  # install TRT
!python export.py --weights yolov5s.pt --include engine --imgsz 640 --device 0 --dynamic  # export

Screen Shot 2022-07-15 at 4 23 41 PM

@democat3457
Copy link
Contributor Author

I tested this PR in colab but got an error. Could you take a look please?

The issue had to do with some int division rounding down to 0, should be fixed now

@glenn-jocher
Copy link
Member

@democat3457 thanks! I'll retest

@democat3457
Copy link
Contributor Author

@glenn-jocher have you been able to retest?

@glenn-jocher
Copy link
Member

@democat3457 thanks for the reminder, testing now!

@glenn-jocher
Copy link
Member

@democat3457 @democat3457 PR fails on batch-size 2 inference:
Screen Shot 2022-07-27 at 4 17 04 PM

To reproduce:

!git clone https://github.com/democat3457/yolov5 -b patch-1  # clone
%cd yolov5
%pip install -qr requirements.txt  # install
%pip install -U nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com  # install TRT
!python export.py --weights yolov5s.pt --include engine --imgsz 640 --device 0 --dynamic  # export


# PyTorch Hub
import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5s.engine')

# Images
dir = 'https://ultralytics.com/images/'
imgs = [dir + f for f in ('zidane.jpg', 'bus.jpg')]  # batch of images

# Inference
results = model(imgs)
results.print()  # or .show(), .save()

@democat3457
Copy link
Contributor Author

PR fails on batch-size 2 inference:
Screen Shot 2022-07-27 at 4 17 04 PM

@glenn-jocher this is because you exported with a (default) max batch size of 1, but tried to use a batch size of 2 when inferencing. TensorRT requires a maximum batch size to properly do dynamic batches, so the --batch-size argument is required to tell TensorRT what the max batch size is.

@democat3457
Copy link
Contributor Author

A warning is now displayed if batch-size <= 1 when the dynamic flag is enabled to tell the user to specify a maximum batch size.

@glenn-jocher
Copy link
Member

Ah got it! I'll try again following your updates above.

@glenn-jocher
Copy link
Member

glenn-jocher commented Jul 27, 2022

@democat3457 I retested with --batch-size 16 during export and two images batched during inference but I get a new error now in Colab:
https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb?hl=en

!git clone https://github.com/democat3457/yolov5 -b patch-1  # clone
%cd yolov5
%pip install -qr requirements.txt  # install
%pip install -U nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com  # install TRT
!python export.py --weights yolov5s.pt --include engine --imgsz 640 --device 0 --dynamic --batch-size 16 # export


# PyTorch Hub
import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5s.engine')

# Images
dir = 'https://ultralytics.com/images/'
imgs = [dir + f for f in ('zidane.jpg', 'bus.jpg')]  # batch of images

# Inference
results = model(imgs)
results.print()  # or .show(), .save()

Error on PyTorch Hub inference:

YOLOv5 πŸš€ 2022-7-27 Python-3.7.13 torch-1.12.0+cu113 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)

Loading yolov5s.engine for TensorRT inference...
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[/root/.cache/torch/hub/ultralytics_yolov5_master/hubconf.py](https://localhost:8080/#) in _create(name, pretrained, channels, classes, autoshape, verbose, device)
     45         if pretrained and channels == 3 and classes == 80:
---> 46             model = DetectMultiBackend(path, device=device, fuse=autoshape)  # download/load FP32 model
     47             # model = models.experimental.attempt_load(path, map_location=device)  # download/load FP32 model

5 frames
ValueError: negative dimensions are not allowed

The above exception was the direct cause of the following exception:

Exception                                 Traceback (most recent call last)
[/root/.cache/torch/hub/ultralytics_yolov5_master/hubconf.py](https://localhost:8080/#) in _create(name, pretrained, channels, classes, autoshape, verbose, device)
     65         help_url = 'https://docs.ultralytics.com/yolov5/tutorials/pytorch_hub_model_loading'
     66         s = f'{e}. Cache may be out of date, try `force_reload=True` or see {help_url} for help.'
---> 67         raise Exception(s) from e
     68 
     69 

Exception: negative dimensions are not allowed. Cache may be out of date, try `force_reload=True` or see https://docs.ultralytics.com/yolov5/tutorials/pytorch_hub_model_loading for help.

@democat3457
Copy link
Contributor Author

@glenn-jocher the issue is with the loaded repo when you use torch.hub.load. You load the default ultralytics/yolov5, which doesn't have the updated DetectMultiBackend.

I fixed that line to

model = torch.hub.load('/content/yolov5', 'custom', 'yolov5s.engine', source='local')

After reloading the runtime and re-running the script, I get this:

YOLOv5 πŸš€ v6.1-346-g352d45a Python-3.7.13 torch-1.12.0+cu113 CUDA:0 (Tesla T4, 15110MiB)

Loading yolov5s.engine for TensorRT inference...
Adding AutoShape... 

image 1/16: 720x1280 2 class0s, 2 class27s
image 2/16: 1080x810 4 class0s, 1 class5
Speed: 25.0ms pre-process, 44.3ms inference, 1.4ms NMS per image at shape (2, 3, 640, 640)

@glenn-jocher
Copy link
Member

@democat3457 oh of course, beginner mistake I made. Thanks for reviewing.

@glenn-jocher
Copy link
Member

This works now:

!git clone https://github.com/democat3457/yolov5 -b patch-1  # clone
%cd yolov5
%pip install -qr requirements.txt  # install
%pip install -U nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com  # install TRT
!python export.py --weights yolov5s.pt --include engine --imgsz 640 --device 0 --dynamic  # export


# PyTorch Hub
import torch

# Model
model = torch.hub.load('democat3457/yolov5:patch-1', 'custom', 'yolov5s.engine')

# Images
dir = 'https://ultralytics.com/images/'
imgs = [dir + f for f in ('zidane.jpg', 'bus.jpg')]  # batch of images

# Inference
results = model(imgs)
results.print()  # or .show(), .save()

@glenn-jocher glenn-jocher merged commit 587a3a3 into ultralytics:master Jul 29, 2022
@glenn-jocher
Copy link
Member

@democat3457 PR is merged. Thank you for your contributions to YOLOv5 πŸš€ and Vision AI ⭐

@glenn-jocher glenn-jocher removed the TODO label Jul 29, 2022
@democat3457 democat3457 deleted the patch-1 branch July 29, 2022 16:18
ctjanuhowski pushed a commit to ctjanuhowski/yolov5 that referenced this pull request Sep 8, 2022
* Dynamic batch size support for TensorRT

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update export.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix optimization profile when batch size is 1

* Warn users if they use batch-size=1 with dynamic

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* More descriptive assertion error

* Fix syntax

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* pre-commit formatting sucked

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update export.py

Co-authored-by: Colin Wong <noreply@brains4drones.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
@YoungjaeDev
Copy link

@democat3457
Is it possible to explain the details of the part?

  1. 9 images with a non-padded batch size of 12 and 7 images with a non-padded batch size of 28 , I'm confused with the meaning
  2. (below mentioned) Is the measured time for the dynamic model? Then where can I see the measurement time for non-dynamic?

To inference on normal TRT models, I pad the rest of the batch with np.zeros to fill the batch up to the batch size (dynamic does not need this)

  • There are 9 images with a non-padded batch size of 12 and 7 images with a non-padded batch size of 28
    a. 19 ms inference on normal TRT model (batch size of 32)
    b. 9 ms inference on dynamic TRT model (batch size of 12)
    c. 18 ms inference on dynamic TRT model (batch size of 28)

@democat3457
Copy link
Contributor Author

@youngjae-avikus sure

  1. Both the normal and dynamic models were created with a batch size of 32 (with the dynamic model technically being a max batch size of 32). When I say image, I technically just mean one batch - so in my testing, I ran 9 batches with a batch size of 12, and 7 batches with a batch size of 28. Also, the "normal" model is the non-dynamic one.
    With the dynamic model, it can run any batch at a batch size less than or equal to the max batch size, so it can run all of the tests without modifying the model input. However, with the normal model, it requires a fixed batch size of 32 as its input, so the various batches with batch sizes 12 and 28 must be padded with extra images to fill them up to the batch size of 32, in this case just empty images created with np.zeros.
  2. Because the input batch size is constant for the normal model, there is only one measurement for it, and that's 19ms inference on average for the fixed batch size of 32. For the dynamic model, there's two listed measurements, one for each batch size, so a batch size of 12 is about 9ms inference on average and a batch size of 28 is about 18ms inference on average.

@YoungjaeDev
Copy link

@democat3457
in conclusion
a. 19 ms inference on normal TRT model (batch size of 32) -> constant 32 images
b. 9 ms inference on dynamic TRT model (batch size of 12) -> 9 images
c. 18 ms inference on dynamic TRT model (batch size of 28) -> 7 images

Then, what can be said about the reason for conducting the experimental procedure as above and what does the result suggest?

@democat3457
Copy link
Contributor Author

A correction to your point a: the normal model was ran with the 16 (9+7) total images, not 32 images (but all of those images did have their batches padded with zeros).

This experimental procedure shows that dynamic models do seem to allow proportionality of process time and batch size without sacrificing any base performance.

@YoungjaeDev
Copy link

YoungjaeDev commented Dec 26, 2022

@democat3457
Oh, so in summary, test a is a combination of 16 images and 16 pad images
In the end, batch size and process time are proportional regardless of the number of images, whether dynamic or constant model!

@democat3457
Copy link
Contributor Author

Not regardless of dynamic or normal model, no.

Process time is proportional to batch size only if using dynamic, but not if using normal model.

@YoungjaeDev
Copy link

@democat3457

  1. Ah, I said that the batch_size 32 is slower than the batch_size 16 in the case of the normal model. Of course, the size of the input that we have to deal with (process time) has grown
  2. dynamic model: batch size and process time are proportional regardless of the number of images, right?

@democat3457
Copy link
Contributor Author

democat3457 commented Dec 27, 2022

  1. I never tested a batch size 16 for any model though

  2. correct

@glenn-jocher
Copy link
Member

@democat3457 if you would like to further discuss testing methodologies or performance characteristics, please feel free to do so.

@democat3457
Copy link
Contributor Author

Are there any additional benchmarks in particular that you want me to discuss? (I was under the impression everything in this thread was fairly sufficient.)

@glenn-jocher
Copy link
Member

@democat3457 it looks like all the necessary benchmarks have been covered in this thread. If you have any other questions or need further assistance with anything else, please feel free to ask!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

how can I load dynamic Tensorrt model?
3 participants