Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_model fail cause gpu memory leak #268

Open
zjd1988 opened this issue Aug 27, 2024 · 0 comments
Open

load_model fail cause gpu memory leak #268

zjd1988 opened this issue Aug 27, 2024 · 0 comments

Comments

@zjd1988
Copy link

zjd1988 commented Aug 27, 2024

Description
onnxruntime backend load model fail will cause gpu memory leak

Triton Information
r23.12 and r24.07

Are you using the Triton container or did you build it yourself?
use nvcr.io/nvidia/tritonserver:r23.12-py3

To Reproduce
use densenet_onnx model and change config.pbtxt output shape(from 1000 -> 1001),
start tritonser with explicit

tritonserver --model-control-mode=explicit --model-repository=/models

then call python grpc client load_model api, output log as follows:

+----------------------------------+----------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                              |
+----------------------------------+----------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                             |
| server_version                   | 2.41.0                                                                                             |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model |
|                                  | _configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics tr |
|                                  | ace logging                                                                                        |
| model_repository_path[0]         | /workspace/triton_bug_models/load_bug_models/                                                      |
| model_control_mode               | MODE_EXPLICIT                                                                                      |
| strict_model_config              | 0                                                                                                  |
| rate_limit                       | OFF                                                                                                |
| pinned_memory_pool_byte_size     | 268435456                                                                                          |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                           |
| min_supported_compute_capability | 6.0                                                                                                |
| strict_readiness                 | 1                                                                                                  |
| exit_timeout                     | 30                                                                                                 |
| cache_enabled                    | 0                                                                                                  |
+----------------------------------+----------------------------------------------------------------------------------------------------+

I0827 01:49:08.461295 459 grpc_server.cc:2495] Started GRPCInferenceService at 0.0.0.0:8001
I0827 01:49:08.461547 459 http_server.cc:4619] Started HTTPService at 0.0.0.0:8000
I0827 01:49:08.502527 459 http_server.cc:282] Started Metrics Service at 0.0.0.0:8002
I0827 01:50:19.324972 459 model_lifecycle.cc:461] loading: densenet_onnx:1
I0827 01:50:19.327742 459 onnxruntime.cc:2608] TRITONBACKEND_Initialize: onnxruntime
I0827 01:50:19.327772 459 onnxruntime.cc:2618] Triton TRITONBACKEND API version: 1.17
I0827 01:50:19.327781 459 onnxruntime.cc:2624] 'onnxruntime' TRITONBACKEND API version: 1.17
I0827 01:50:19.327786 459 onnxruntime.cc:2654] backend configuration:
{"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}}
I0827 01:50:19.347738 459 onnxruntime.cc:2719] TRITONBACKEND_ModelInitialize: densenet_onnx (version 1)
I0827 01:50:19.348521 459 onnxruntime.cc:692] skipping model configuration auto-complete for 'densenet_onnx': inputs and outputs already specified
I0827 01:50:19.360188 459 onnxruntime.cc:2784] TRITONBACKEND_ModelInstanceInitialize: densenet_onnx_0_0 (GPU device 0)
I0827 01:50:19.658303 459 onnxruntime.cc:2836] TRITONBACKEND_ModelInstanceFinalize: delete instance state
E0827 01:50:19.658470 459 backend_model.cc:635] ERROR: Failed to create instance: model 'densenet_onnx', tensor 'fc6_1': the model expects 4 dimensions (shape [1,1000,1,1]) but the model configuration specifies 1 dimensions (shape [1001])
I0827 01:50:19.658504 459 onnxruntime.cc:2760] TRITONBACKEND_ModelFinalize: delete model state
E0827 01:50:19.658544 459 model_lifecycle.cc:621] failed to load 'densenet_onnx' version 1: Invalid argument: model 'densenet_onnx', tensor 'fc6_1': the model expects 4 dimensions (shape [1,1000,1,1]) but the model configuration specifies 1 dimensions (shape [1001])
I0827 01:50:19.658573 459 model_lifecycle.cc:756] failed to load 'densenet_onnx'
I0827 01:50:29.020538 459 model_lifecycle.cc:461] loading: densenet_onnx:1
I0827 01:50:29.023708 459 onnxruntime.cc:2719] TRITONBACKEND_ModelInitialize: densenet_onnx (version 1)
I0827 01:50:29.024254 459 onnxruntime.cc:692] skipping model configuration auto-complete for 'densenet_onnx': inputs and outputs already specified
I0827 01:50:29.099367 459 onnxruntime.cc:2784] TRITONBACKEND_ModelInstanceInitialize: densenet_onnx_0_0 (GPU device 0)
I0827 01:50:29.297239 459 onnxruntime.cc:2836] TRITONBACKEND_ModelInstanceFinalize: delete instance state
E0827 01:50:29.297383 459 backend_model.cc:635] ERROR: Failed to create instance: model 'densenet_onnx', tensor 'fc6_1': the model expects 4 dimensions (shape [1,1000,1,1]) but the model configuration specifies 1 dimensions (shape [1001])
I0827 01:50:29.297415 459 onnxruntime.cc:2760] TRITONBACKEND_ModelFinalize: delete model state
E0827 01:50:29.297465 459 model_lifecycle.cc:621] failed to load 'densenet_onnx' version 1: Invalid argument: model 'densenet_onnx', tensor 'fc6_1': the model expects 4 dimensions (shape [1,1000,1,1]) but the model configuration specifies 1 dimensions (shape [1001])
I0827 01:50:29.297480 459 model_lifecycle.cc:756] failed to load 'densenet_onnx'

config file:

name: "densenet_onnx"
platform: "onnxruntime_onnx"
max_batch_size : 0
input [
  {
    name: "data_0"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 224, 224 ]
    reshape { shape: [ 1, 3, 224, 224 ] }
  }
]
output [
  {
    name: "fc6_1"
    data_type: TYPE_FP32
    dims: [ 1001 ]
  }
]

instance_group [ 
  { 
    count: 1
    kind: KIND_GPU
    gpus: [ 0 ]
  }
]

before call python
企业微信截图_17247243623995

after call 5 times load_model
企业微信截图_17247243735112

after call 10 times load_model
企业微信截图_17247243882564

Expected behavior
no gpu memory increase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant