Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT with GPU, Docker image size, and base image for building a docker image #11452

Closed
1 task done
hlmhlr opened this issue Apr 28, 2023 · 11 comments
Closed
1 task done
Labels
question Further information is requested

Comments

@hlmhlr
Copy link

hlmhlr commented Apr 28, 2023

Search before asking

Question

Hi,

The question is related to running Yolov5 with TensorRT on Docker. I found a similar context here but the issue I am facing is different.

I pulled yolov5 latest docker image, and run the the command python export.py --weights yolov5s.pt --include engine --imgsz 640 --device 0 to convert the weight file from .pt to tensorrt but it threw the error shown below:

error-1

Later, I tried to solve the error using this issue which installed the tensorrt
and its snapshot is shown below:

tensorrt-inst

After that, I used the previous command again python export.py --weights yolov5s.pt --include engine --imgsz 640 --device 0 and it threw the following error:

error-2

  1. It is clear that Cuda initialization is failed and GPU is necessary for tensorrt to run. How do I make sure the installation of tensorrt is compatible with Cuda so that GPU could be initialized? However, the nvidia-smi works fine both inside and outside the docker.

During debugging, I used the base image nvcr.io/nvidia/pytorch:21.11-py3 as mentioned in this issue, and generated yolov5 image locally but it also did not help out.

  1. After generating the image locally, It is observed that the yolov5 image generated locally is almost double the yolov5 image pulled from the docker hub. Their sizes are shown below but why the size is more than twice that pulled image?

yolo_imag_sizes

I used the following Dockerfile to generate the image

# YOLOv5 🚀 by Ultralytics, GPL-3.0 license 
 # Builds ultralytics/yolov5:latest image on DockerHub https://hub.docker.com/r/ultralytics/yolov5 
 # Image is CUDA-optimized for YOLOv5 single/multi-GPU training and inference 
  
 # Start FROM NVIDIA PyTorch image https://ngc.nvidia.com/catalog/containers/nvidia:pytorch 
 FROM nvcr.io/nvidia/pytorch:21.11-py3
 RUN rm -rf /opt/pytorch  # remove 1.2GB dir 
  
 # Downloads to user config dir 
 ADD https://ultralytics.com/assets/Arial.ttf https://ultralytics.com/assets/Arial.Unicode.ttf /root/.config/Ultralytics/ 
  
 # Install linux packages 
 RUN apt update && apt install --no-install-recommends -y zip htop screen libgl1-mesa-glx 
  
 # Install pip packages 
 COPY requirements.txt . 
 RUN python -m pip install --upgrade pip 
 RUN pip uninstall -y torch torchvision torchtext Pillow 
 RUN pip install --no-cache -r requirements.txt albumentations wandb gsutil notebook Pillow>=9.1.0 \ 
     'opencv-python<4.6.0.66' \ 
     --extra-index-url https://download.pytorch.org/whl/cu113 
  
 # Create working directory 
 RUN mkdir -p /usr/src/app 
 WORKDIR /usr/src/app 
  
 # Copy contents 
 COPY . /usr/src/app 
 RUN git clone https://github.com/ultralytics/yolov5 /usr/src/yolov5 
  
 # Set environment variables 
 ENV OMP_NUM_THREADS=8 
  1. In general, which base image should i use to build a docker image compatible to run with cuda, gpu, and tensorrt?

My quuestions are basic but i am a newbie to play with dockers so i would appreciate for a generous reposne.

Thanks,

Additional

I am using the machine and libraries with following specifications :

Ubuntu 18.04
Graphics crad: NVIDIA GeForce GTX 1660 Ti
NVIDIA drivers 470.182.03
Build cuda_11.4.r11.4/compiler.31964100_0
torch version: 1.7.0+cu101 (in my conda virtual environment)

The specifications at docker side are:

docker version:23.0.2
torch version: 2.0.0
tensorrt: 8.6.0

The issue can be reproduced by following commands (given that docker container is already installed):

$ sudo docker pull ultralytics/yolov5:latest
$ sudo docker run -it --ipc=host --gpus all ultralytics/yolov5:latest
root@01f2f1700171:/usr/src/app# python export.py --weights yolov5s.pt --include engine --imgsz 640 --device 0
@hlmhlr hlmhlr added the question Further information is requested label Apr 28, 2023
@github-actions
Copy link
Contributor

👋 Hello @hlmhlr, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics

@liquored
Copy link

I got a same question as you on Centos just minutes ago. If you get a way to deal with it, please share the solution with me. Thanks so much! Looking forward to your reply.

@glenn-jocher
Copy link
Member

@hlmhlr hello,

Thank you for reaching out. I would be glad to help you with your issue.

Regarding your question about the tensorrt initialization failure and GPU compatibility, please verify that you have installed the correct version of cuda and cudnn that are compatible with the tensorrt version installed on your system. You can check the compatibility matrix between these components in the tensorrt documentation. Please make sure to have these properly installed and configured as pre-requisites to run tensorrt with GPU support.

Furthermore, in regards to your question about the yolov5 image size, it is normal to have differences in image size between the yolov5 image that you pulled from the Docker hub and that generated locally by following the Dockerfile configuration. This is because the yolov5 image on the Docker Hub is pre-built, which allows for faster downloads and can significantly reduce the time to set up a local environment. In contrast, when building locally, the environment and requirements inherently affect the size of the final image.

I hope you find this helpful, and please let me know if you have any further questions or require additional assistance.

Best,

@hlmhlr
Copy link
Author

hlmhlr commented May 1, 2023

Hi,

Many thanks @glenn-jocher for your prompt and generous response.

I tried to use the compatibility matrix from here, and I found the combination: Cuda11.4 + cudnn 8.8.0 + tensorrt 8.6.0.

In my system, I had cuda11.4 and I installed 8.8.0 as shown below which I verified outside the docker container.

cuda_11_4
cudnn_8 8 0

I don't know that how to verify Cuda and Cudnn inside the docker container. Also does the yolov5 docker image already contain Cuda or it automatically initializes Cuda which is already installed on the system?

Later, I installed tensorrt inside the container using the following command and it installed tensorrt8.6.0

root@6aef4bb085d7:/usr/src/app# pip install nvidia-tensorrt
tensorrt_docker

After having these libraries, I ran the code again and it threw the error again as shown below:

tensorrt_error

How do I make sure the correct packages or their right combination and their installation? Which packages do I have to install inside or outside the docker to make them run from the docker? Further, every time I have to re-install the tensorrt inside the docker after the run. What is the solution to permanently install the tensorrt inside the docker image?

Also, I found the cudnn8.2.4 with cuda11.4 as mentioned here. So, I installed this cudnn version which is installed correctly as shown below:

cudnn_8_2_4

Besides, I tried to use some earlier version of tensorrt (8.2.2.8) and this time I installed it outside the docker but inside the anaconda. Unfortunately, it also did not work.

The same practice I did with one of the conda virtual environments but it also did not work. Like its completely messy.

Any help would be really appreciated.

@glenn-jocher
Copy link
Member

@hlmhlr hello,

Thank you for reaching out. It seems like you have encountered issues with the compatibility between CUDA, cuDNN and TensorRT inside and outside the Docker container.

To verify the installed CUDA and cuDNN versions inside the Docker container, you can check via the terminal by running the following commands:

$ nvcc -V
$ cat /usr/local/cuda/version.txt
$ cat /usr/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

The first command displays the version of CUDA installed, while the second command displays the version information of the installed CUDA. The third command displays the installed cuDNN version.

Regarding your question about TensorRT installation, it looks like you have installed TensorRT successfully inside the Docker container using pip install nvidia-tensorrt. To permanently install TensorRT inside the Docker image, you should consider modifying the Dockerfile by adding nvidia-tensorrt to the list of requirements and rebuilding the Docker image.

To properly configure the compatibility between these components, I would recommend checking the version compatibility matrix for CUDA, cuDNN and TensorRT provided by NVIDIA, and then installing the compatible versions of each package based on your requirements. If you are running into issues after following the compatibility matrix, be sure to check your TensorFlow/PyTorch version compatibility to make sure that the installed software versions are compatible with each other.

I hope this helps. Let me know if you have any additional questions or concerns.

@hlmhlr
Copy link
Author

hlmhlr commented May 1, 2023

Hi, @glenn-jocher, Many thanks for your prompt and generous response. I shall get back soon after troubleshooting.

@glenn-jocher
Copy link
Member

Hi @hlmhlr,

You're welcome! I'm glad that my response was helpful. Please take your time to troubleshoot the issue, and don't hesitate to reach out if you have any further questions or concerns. I'm always here to help.

Best regards,

@hlmhlr
Copy link
Author

hlmhlr commented May 3, 2023

Hi @glenn-jocher,

I tried to figure out the cuda and cudnn inside the docker image but there were non as shown in this figure.

nvcc_not_found

For the sake of clarity, I removed the yolov5 docker image and re-pulled it from the docker hub, however, it was unsuccessful. Later, I tried to mount the cuda and cuddn installed on my system, while running the docker image by using the following command and this time it worked.

sudo docker run -it --ipc=host --gpus all --mount type=bind,source=/usr/local/cuda-11.4,target=/usr/local/cuda --mount type=bind,source=/usr/local/cuda-11.4/include/,target=/usr/local/cudnn ultralytics/yolov5:latest

So, now the combination that worked is: Cuda11.4 + cudnn 8.2.4 + nvidia-tensorrt 8.4.3.1.

For nvidia-tensorrt, I simply ran this command: root@dfd9e25567e5:/usr/src/app# pip install "nvidia-tensorrt" -U --index-url https://pypi.ngc.nvidia.com.

Execution of the sample program is:

root@dfd9e25567e5:/usr/src/app# python export.py --weights yolov5s.pt --include engine --device 0
export: data=data/coco128.yaml, weights=['yolov5s.pt'], imgsz=[640, 640], batch_size=1, device=0, half=False, inplace=False, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=17, verbose=False, workspace=4, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['engine']
YOLOv5 🚀 v7.0-160-g867f7f0 Python-3.10.9 torch-2.0.0 CUDA:0 (NVIDIA GeForce GTX 1660 Ti, 5945MiB)

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients

PyTorch: starting from yolov5s.pt with output shape (1, 25200, 85) (14.1 MB)

ONNX: starting export with onnx 1.13.1...
================ Diagnostic Run torch.onnx.export version 2.0.0 ================
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

ONNX: export success ✅ 0.7s, saved as yolov5s.onnx (28.0 MB)

TensorRT: starting export with TensorRT 8.4.3.1...
[05/02/2023-15:34:19] [TRT] [I] [MemUsageChange] Init CUDA: CPU +299, GPU +0, now: CPU 2421, GPU 2353 (MiB)
[05/02/2023-15:34:20] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +207, GPU +70, now: CPU 2645, GPU 2423 (MiB)
/usr/src/app/export.py:299: DeprecationWarning: Use set_memory_pool_limit instead.
  config.max_workspace_size = workspace * 1 << 30
[05/02/2023-15:34:20] [TRT] [I] ----------------------------------------------------------------
[05/02/2023-15:34:20] [TRT] [I] Input filename:   yolov5s.onnx
[05/02/2023-15:34:20] [TRT] [I] ONNX IR version:  0.0.7
[05/02/2023-15:34:20] [TRT] [I] Opset version:    12
[05/02/2023-15:34:20] [TRT] [I] Producer name:    pytorch
[05/02/2023-15:34:20] [TRT] [I] Producer version: 2.0.0
[05/02/2023-15:34:20] [TRT] [I] Domain:           
[05/02/2023-15:34:20] [TRT] [I] Model version:    0
[05/02/2023-15:34:20] [TRT] [I] Doc string:       
[05/02/2023-15:34:20] [TRT] [I] ----------------------------------------------------------------
[05/02/2023-15:34:20] [TRT] [W] onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
TensorRT: input "images" with shape(1, 3, 640, 640) DataType.FLOAT
TensorRT: output "output0" with shape(1, 25200, 85) DataType.FLOAT
TensorRT: building FP32 engine as yolov5s.engine
/usr/src/app/export.py:326: DeprecationWarning: Use build_serialized_network instead.
  with builder.build_engine(network, config) as engine, open(f, 'wb') as t:
[05/02/2023-15:34:20] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 2678, GPU 2431 (MiB)
[05/02/2023-15:34:20] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 2678, GPU 2441 (MiB)
[05/02/2023-15:34:20] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[05/02/2023-15:35:19] [TRT] [I] Some tactics do not have sufficient workspace memory to run. Increasing workspace size will enable more tactics, please check verbose output for requested sizes.
[05/02/2023-15:36:17] [TRT] [I] Detected 1 inputs and 4 output network tensors.
[05/02/2023-15:36:17] [TRT] [I] Total Host Persistent Memory: 149344
[05/02/2023-15:36:17] [TRT] [I] Total Device Persistent Memory: 1722880
[05/02/2023-15:36:17] [TRT] [I] Total Scratch Memory: 0
[05/02/2023-15:36:17] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 0 MiB
[05/02/2023-15:36:17] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 18.3703ms to assign 7 blocks to 130 nodes requiring 35635200 bytes.
[05/02/2023-15:36:17] [TRT] [I] Total Activation Memory: 35635200
[05/02/2023-15:36:17] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 2695, GPU 2481 (MiB)
[05/02/2023-15:36:17] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[05/02/2023-15:36:17] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
[05/02/2023-15:36:17] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
TensorRT: export success ✅ 119.9s, saved as yolov5s.engine (29.4 MB)

Export complete (123.1s)
Results saved to /usr/src/app
Detect:          python detect.py --weights yolov5s.engine 
Validate:        python val.py --weights yolov5s.engine 
PyTorch Hub:     model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5s.engine')  
Visualize:       https://netron.app 

Concerning the installation of tensorrt inside the docker, I re-built the docker using the command sudo docker build -t yolov5-tensorrt . and the dockerfile simply includes the following instructions.

FROM ultralytics/yolov5:latest
RUN pip install "nvidia-tensorrt"  -U --index-url https://pypi.ngc.nvidia.com
# Set environment variables
ENV LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:${LD_LIBRARY_PATH}
ENV PATH=/usr/local/cuda/bin:${PATH}

Everything is up now.

@liquored, if you are still facing the issue, you can try this.

@glenn-jocher, many thanks again and I would request you to please confirm the output of the sample program as mentioned above that everything working is fine as per your references. Also, you may please close this issue with your comments.

@glenn-jocher
Copy link
Member

Hi @hlmhlr,

I'm glad to hear that you were able to resolve the issue by mounting the CUDA and cuDNN installation from your system inside the Docker container. It's also good to know that you found the compatible versions of CUDA, cuDNN, and TensorRT that work for you.

Regarding the sample program output, it looks like everything is working fine based on your references. The program successfully exported the YOLOv5s model to ONNX format and then converted it to TensorRT to accelerate inference on an NVIDIA GPU.

Thank you for your detailed explanation of the steps you took to resolve the issue and for sharing your Dockerfile. This will be helpful for others who may encounter the same issue in the future.

Please don't hesitate to reach out if you have any further questions or concerns. I'll be happy to assist you.

Best regards,

@hlmhlr
Copy link
Author

hlmhlr commented May 4, 2023

Hi @glenn-jocher,

Thanks for your feedback. Also many thanks once again for your prompt and informative responses.

Kind regards,

@hlmhlr hlmhlr closed this as completed May 4, 2023
@glenn-jocher
Copy link
Member

Hi @hlmhlr,

You're welcome! I'm glad that my responses were helpful and informative.

If you have any further questions or concerns, don't hesitate to reach out. I'm always here to help.

Best,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants