Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ultralytics YOLOV5 model to TRT conversion issue. #1

Open
adrianosantospb opened this issue Nov 22, 2021 · 4 comments
Open

Ultralytics YOLOV5 model to TRT conversion issue. #1

adrianosantospb opened this issue Nov 22, 2021 · 4 comments

Comments

@adrianosantospb
Copy link

Hi,

I'm trying to use your implementation to convert an Ultralytics YOLOV5 model to TRT. I'm following the steps you wrote and I'm getting this error:

docker run --runtime nvidia -v ~/Documents/testmodelo/:/models/ --rm alxmamaev/jetson_yolov5_trt:latest trtexec --onnx=/models/01042021yolov5A.onnx --saveEngine=/models/01042021yolov5A.plan --fp16
&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # trtexec --onnx=/models/01042021yolov5A.onnx --saveEngine=/models/01042021yolov5A.plan --fp16
[11/22/2021-19:20:35] [I] === Model Options ===
[11/22/2021-19:20:35] [I] Format: ONNX
[11/22/2021-19:20:35] [I] Model: /models/01042021yolov5A.onnx
[11/22/2021-19:20:35] [I] Output:
[11/22/2021-19:20:35] [I] === Build Options ===
[11/22/2021-19:20:35] [I] Max batch: explicit
[11/22/2021-19:20:35] [I] Workspace: 16 MiB
[11/22/2021-19:20:35] [I] minTiming: 1
[11/22/2021-19:20:35] [I] avgTiming: 8
[11/22/2021-19:20:35] [I] Precision: FP32+FP16
[11/22/2021-19:20:35] [I] Calibration:
[11/22/2021-19:20:35] [I] Refit: Disabled
[11/22/2021-19:20:35] [I] Sparsity: Disabled
[11/22/2021-19:20:35] [I] Safe mode: Disabled
[11/22/2021-19:20:35] [I] Restricted mode: Disabled
[11/22/2021-19:20:35] [I] Save engine: /models/01042021yolov5A.plan
[11/22/2021-19:20:35] [I] Load engine:
[11/22/2021-19:20:35] [I] NVTX verbosity: 0
[11/22/2021-19:20:35] [I] Tactic sources: Using default tactic sources
[11/22/2021-19:20:35] [I] timingCacheMode: local
[11/22/2021-19:20:35] [I] timingCacheFile:
[11/22/2021-19:20:35] [I] Input(s)s format: fp32:CHW
[11/22/2021-19:20:35] [I] Output(s)s format: fp32:CHW
[11/22/2021-19:20:35] [I] Input build shapes: model
[11/22/2021-19:20:35] [I] Input calibration shapes: model
[11/22/2021-19:20:35] [I] === System Options ===
[11/22/2021-19:20:35] [I] Device: 0
[11/22/2021-19:20:35] [I] DLACore:
[11/22/2021-19:20:35] [I] Plugins:
[11/22/2021-19:20:35] [I] === Inference Options ===
[11/22/2021-19:20:35] [I] Batch: Explicit
[11/22/2021-19:20:35] [I] Input inference shapes: model
[11/22/2021-19:20:35] [I] Iterations: 10
[11/22/2021-19:20:35] [I] Duration: 3s (+ 200ms warm up)
[11/22/2021-19:20:35] [I] Sleep time: 0ms
[11/22/2021-19:20:35] [I] Streams: 1
[11/22/2021-19:20:35] [I] ExposeDMA: Disabled
[11/22/2021-19:20:35] [I] Data transfers: Enabled
[11/22/2021-19:20:35] [I] Spin-wait: Disabled
[11/22/2021-19:20:35] [I] Multithreading: Disabled
[11/22/2021-19:20:35] [I] CUDA Graph: Disabled
[11/22/2021-19:20:35] [I] Separate profiling: Disabled
[11/22/2021-19:20:35] [I] Time Deserialize: Disabled
[11/22/2021-19:20:35] [I] Time Refit: Disabled
[11/22/2021-19:20:35] [I] Skip inference: Disabled
[11/22/2021-19:20:35] [I] Inputs:
[11/22/2021-19:20:35] [I] === Reporting Options ===
[11/22/2021-19:20:35] [I] Verbose: Disabled
[11/22/2021-19:20:35] [I] Averages: 10 inferences
[11/22/2021-19:20:35] [I] Percentile: 99
[11/22/2021-19:20:35] [I] Dump refittable layers:Disabled
[11/22/2021-19:20:35] [I] Dump output: Disabled
[11/22/2021-19:20:35] [I] Profile: Disabled
[11/22/2021-19:20:35] [I] Export timing to JSON file:
[11/22/2021-19:20:35] [I] Export output to JSON file:
[11/22/2021-19:20:35] [I] Export profile to JSON file:
[11/22/2021-19:20:35] [I]
[11/22/2021-19:20:35] [I] === Device Information ===
[11/22/2021-19:20:35] [I] Selected Device: Xavier
[11/22/2021-19:20:35] [I] Compute Capability: 7.2
[11/22/2021-19:20:35] [I] SMs: 6
[11/22/2021-19:20:35] [I] Compute Clock Rate: 1.109 GHz
[11/22/2021-19:20:35] [I] Device Global Memory: 7773 MiB
[11/22/2021-19:20:35] [I] Shared Memory per SM: 96 KiB
[11/22/2021-19:20:35] [I] Memory Bus Width: 256 bits (ECC disabled)
[11/22/2021-19:20:35] [I] Memory Clock Rate: 1.109 GHz
[11/22/2021-19:20:35] [I]
[11/22/2021-19:20:35] [I] TensorRT version: 8001
[11/22/2021-19:20:37] [I] [TRT] [MemUsageChange] Init CUDA: CPU +353, GPU +0, now: CPU 371, GPU 6306 (MiB)
[11/22/2021-19:20:37] [I] Start parsing network model
[11/22/2021-19:20:38] [I] [TRT] ----------------------------------------------------------------
[11/22/2021-19:20:38] [I] [TRT] Input filename: /models/01042021yolov5A.onnx
[11/22/2021-19:20:38] [I] [TRT] ONNX IR version: 0.0.7
[11/22/2021-19:20:38] [I] [TRT] Opset version: 13
[11/22/2021-19:20:38] [I] [TRT] Producer name: pytorch
[11/22/2021-19:20:38] [I] [TRT] Producer version: 1.10
[11/22/2021-19:20:38] [I] [TRT] Domain:
[11/22/2021-19:20:38] [I] [TRT] Model version: 0
[11/22/2021-19:20:38] [I] [TRT] Doc string:
[11/22/2021-19:20:38] [I] [TRT] ----------------------------------------------------------------
[11/22/2021-19:20:39] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[11/22/2021-19:20:39] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[11/22/2021-19:20:39] [11/22/2021-19:20:39] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[11/22/2021-19:20:39] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[11/22/2021-19:20:39] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[11/22/2021-19:20:39] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[11/22/2021-19:20:39] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[11/22/2021-19:20:39] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[11/22/2021-19:20:40] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[11/22/2021-19:20:41] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[11/22/2021-19:20:41] [11/22/2021-19:20:41] [11/22/2021-19:20:41] [11/22/2021-19:20:41] [11/22/2021-19:20:41] [11/22/2021-19:20:41] [11/22/2021-19:20:41] [W] [TRT] Output type must be INT32 for shape outputs
[W] [TRT] Output type must be INT32 for shape outputs
[W] [TRT] Output type must be INT32 for shape outputs
[W] [TRT] Output type must be INT32 for shape outputs
[W] [TRT] Output type must be INT32 for shape outputs
[W] [TRT] Output type must be INT32 for shape outputs
[11/22/2021-19:20:41] [I] Finish parsing network model
[W] Dynamic dimensions required for input: images, but no shapes were provided. Automatically overriding shape to: 1x3x1x1
[11/22/2021-19:20:41] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 708, GPU 6947 (MiB)
[11/22/2021-19:20:41] [11/22/2021-19:20:41] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 708 MiB, GPU 6946 MiB
[11/22/2021-19:20:42] [11/22/2021-19:20:42] [E] Error[2]: [graphShapeAnalyzer.cpp::throwIfError::1306] Error Code 2: Internal Error (Concat_40: dimensions not compatible for concatenation
Concat_40: dimensions not compatible for concatenation
Concat_40: dimensions not compatible for concatenation
Concat_40: dimensions not compatible for concatenation
)
[E] Error[2]: [builder.cpp::buildSerializedNetwork::417] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed.)

Do you know some information to help me with this issue?

Tks.

@alxmamaev
Copy link
Owner

alxmamaev commented Nov 29, 2021

@adrianosantospb Did you use on cell exporter from ultralytics?

@Poulinakis-Konstantinos
Copy link

Poulinakis-Konstantinos commented Nov 29, 2021

Hello @alxmamaev, I also encounter the same issue. I have used ultralytics exporter to convert the pt model into ONNX.
Then I tried to use the command abo to convert ONNX->TRT and got the same exact error.
The model I used was the demo model yolov5s offered by ultralytics .

Any help would be greatly appreciated. Thanks

@alxmamaev
Copy link
Owner

@Poulinakis-Konstantinos I think may be it's happens with recent version of pytorch/yolov5. Can you check your model with yolov5 version at Aug 2021?

@alxmamaev
Copy link
Owner

alxmamaev commented Dec 2, 2021

@adrianosantospb @Poulinakis-Konstantinos I am not a creator of the convertor, I use trtexec util from this repo https://github.com/NVIDIA/TensorRT

I think, that may few reasons of problem.

  1. The first is that some boxes post process operation is included into onnx file (like NMS os something like that) and TRTexec cannot to convert it.
  2. In the new version of YoloV5 architecture may little bit changed and operation that is used (based on you log it is Concat_40) is not supported by this version of tensor RT.

You can firstly check model onnx graph using https://netron.app , if it is okay - you can try to downgrade yolov5 to more older commits or trying to build new version of trtexec, for that you can change version of TRTexec there

RUN cd TensorRT && git checkout release/7.1 && git submodule update --init --recursive

Compiling take about 1hr on Jetson Nano.

If you have some troubles with building - you are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants