Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triton ONNX runtime backend slower than onnxruntime python client on CPU #265

Open
Mitix-EPI opened this issue Aug 19, 2024 · 0 comments
Open

Comments

@Mitix-EPI
Copy link

Description
When deploying an ONNX model using the Triton Inference Server's ONNX runtime backend, the inference performance on the CPU is noticeably slower compared to running the same model using the ONNXRuntime Python client directly. This performance discrepancy is observed under identical conditions, including the same hardware, model, and input data.

Triton Information
TRITON_VERSION=2.46.0

To Reproduce

model used:

wget -O model.onnx https://contentmamluswest001.blob.core.windows.net/content/14b2744cf8d6418c87ffddc3f3127242/9502630827244d60a1214f250e3bbca7/08aed7327d694b8dbaee2c97b8d0fcba/densenet121-1.2.onnx

Triton server (ONNX runtime)

config.pbtxt

name: "test_densenet" 
platform: "onnxruntime_onnx"

Python clients

Triton client

import numpy as np
import tritonclient.grpc as grpcclient
import tritonclient.grpc._infer_input as infer_input

grpcclient = grpcclient.InferenceServerClient(url='localhost:9178')

i = infer_input.InferInput('data_0', [1, 3, 224, 224], 'FP32')
i.set_data_from_numpy(np.zeros((1, 3, 224, 224), dtype=np.float32))
%%timeit
res = grpcclient.infer(model_name="test_densenet", inputs=[i])

results: 473 ms ± 87.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

ONNX Runtime

import onnxruntime as ort

ort_sess = ort.InferenceSession('model.onnx')
test_inputs = {"data_0": np.zeros((1, 3, 224, 224), dtype=np.float32)}
%%timeit
ort_sess.run(["fc6_1"], test_inputs)

results: 159 ms ± 23.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant