ONNX Inference Speed extremely slow compare to .pt Model #4808

shrijan00 · 2021-09-15T14:10:09Z

Hi,
I tried to inference an image of resolution 1024*1536 using onnx and .pt model
As you can see the huge time difference between the 2 cases in the image

Any reason for this?

glenn-jocher · 2021-09-15T15:28:49Z

@shrijan00 onnx models run on CPU

shrijan00 · 2021-09-15T15:47:32Z

onnx models run on CPU
I tried to add in the detect.py
session = onnxruntime.InferenceSession(w, None, providers='CUDAExecutionProvider')
but doesn't seem to work. Any other way to run onnx on GPU?

glenn-jocher · 2021-09-15T15:53:42Z

@shrijan00 I don’t know, but if you find a good solution make sure to submit a PR to help others run Onnx on GPU!

zhaojun060708 · 2021-09-17T07:33:13Z

@glenn-jocher Why the exported onnx model does not support GPU?

glenn-jocher · 2021-09-17T09:29:51Z

@zhaojun060708 see https://stackoverflow.com/questions/64452013/how-do-you-run-a-onnx-model-on-a-gpu

happyday-lkj · 2021-09-27T03:35:26Z

have you solve the problems?

callbarian · 2021-10-07T05:33:05Z

in detect.py, change this line
check_requirements(('onnx', 'onnxruntime'))
to
check_requirements(('onnx', 'onnxruntime-gpu'))

so that the code will not install onnxruntime, which is cpu version.

make sure you have installed cuda and cudnn to use onnxruntime-gpu

glenn-jocher · 2021-10-07T22:15:20Z

@callbarian thanks for the pointer! I didn't know about the -gpu package. We should make the requirements check conditional on the hardware then, with GPU-enabled systems installing -gpu automatically. I'll submit a PR for this fix.

TODO: Install onnxruntime-gpu automatically if the user has a CUDA-enabled system.

glenn-jocher · 2021-10-07T22:41:51Z

@callbarian I opened a PR #5087 for this, but testing this PR does not show improved ONNX inference speeds even after installing onnxruntime-gpu. Are there additional steps required for ONNX to use your GPU? There's a cryptic warning message about CUDA/CPU ExecutionProvider. This is in Colab.

!python export.py --weights yolov5s.pt --include onnx --dynamic --simplify

!python detect.py --weights yolov5s.onnx

detect: weights=['yolov5s.onnx'], source=data/images, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False
YOLOv5 🚀 v5.0-498-g16f413b torch 1.9.0+cu111 CUDA:0 (Tesla P100-PCIE-16GB, 16280.875MB)

/usr/local/lib/python3.7/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py:353: UserWarning: Deprecation warning. This ORT build has ['CUDAExecutionProvider', 'CPUExecutionProvider'] enabled. The next release (ORT 1.10) will require explicitly setting the providers parameter (as opposed to the current behavior of providers getting set/registered by default based on the build flags) when instantiating InferenceSession.For example, onnxruntime.InferenceSession(..., providers=["CUDAExecutionProvider"], ...)
  "based on the build flags) when instantiating InferenceSession."

image 1/2 /content/yolov5/data/images/bus.jpg: 640x640 4 class0s, 1 class5, Done. (0.846s)
image 2/2 /content/yolov5/data/images/zidane.jpg: 640x640 2 class0s, 2 class27s, Done. (0.262s)
Speed: 2.0ms pre-process, 554.0ms inference, 1.8ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/detect/exp3

glenn-jocher · 2021-10-10T22:36:00Z

@callbarian can you comment on PR #5087 that adds onnxruntime-gpu installation to detect.py but does not result faster inference? Thanks!

MrRace · 2022-04-12T07:40:13Z

@callbarian I opened a PR #5087 for this, but testing this PR does not show improved ONNX inference speeds even after installing onnxruntime-gpu. Are there additional steps required for ONNX to use your GPU? There's a cryptic warning message about CUDA/CPU ExecutionProvider. This is in Colab.

!python export.py --weights yolov5s.pt --include onnx --dynamic --simplify

!python detect.py --weights yolov5s.onnx

detect: weights=['yolov5s.onnx'], source=data/images, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False
YOLOv5 🚀 v5.0-498-g16f413b torch 1.9.0+cu111 CUDA:0 (Tesla P100-PCIE-16GB, 16280.875MB)

/usr/local/lib/python3.7/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py:353: UserWarning: Deprecation warning. This ORT build has ['CUDAExecutionProvider', 'CPUExecutionProvider'] enabled. The next release (ORT 1.10) will require explicitly setting the providers parameter (as opposed to the current behavior of providers getting set/registered by default based on the build flags) when instantiating InferenceSession.For example, onnxruntime.InferenceSession(..., providers=["CUDAExecutionProvider"], ...)
  "based on the build flags) when instantiating InferenceSession."

image 1/2 /content/yolov5/data/images/bus.jpg: 640x640 4 class0s, 1 class5, Done. (0.846s)
image 2/2 /content/yolov5/data/images/zidane.jpg: 640x640 2 class0s, 2 class27s, Done. (0.262s)
Speed: 2.0ms pre-process, 554.0ms inference, 1.8ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/detect/exp3

@glenn-jocher From your testing results, it seems extremely bad which cost 554.0ms inference! Therefore the problem has not been fixed~

MrRace · 2022-04-12T07:53:50Z

@callbarian have you solved the problem？

glenn-jocher · 2022-04-12T09:16:07Z

@MrRace ONNX export inference is working correctly with comparable speeds to PyTorch, i.e. see #6963

Run YOLOv5 benchmarks (speed and accuracy) for all supported export formats. This PR adds GPU benchmarking capability following CPU benchmarking PR #6613.

Format	`export.py --include`	Model
PyTorch	-	`yolov5s.pt`
TorchScript	`torchscript`	`yolov5s.torchscript`
ONNX	`onnx`	`yolov5s.onnx`
OpenVINO	`openvino`	`yolov5s_openvino_model/`
TensorRT	`engine`	`yolov5s.engine`
CoreML	`coreml`	`yolov5s.mlmodel`
TensorFlow SavedModel	`saved_model`	`yolov5s_saved_model/`
TensorFlow GraphDef	`pb`	`yolov5s.pb`
TensorFlow Lite	`tflite`	`yolov5s.tflite`
TensorFlow Edge TPU	`edgetpu`	`yolov5s_edgetpu.tflite`
TensorFlow.js	`tfjs`	`yolov5s_web_model/`

Usage:

git clone https://github.com/ultralytics/yolov5 -b update/bench_gpu  # clone
cd yolov5
pip install -qr requirements.txt coremltools onnx onnxruntime-gpu openvino-dev  # install
pip install -U nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com  # TensorRT

python utils/benchmarks.py --weights yolov5s.pt --img 640 --device 0

Colab++ V100 High-RAM Results

benchmarks: weights=/content/yolov5/yolov5s.pt, imgsz=640, batch_size=1, data=/content/yolov5/data/coco128.yaml, device=0, half=False
Checking setup...
YOLOv5 🚀 v6.1-48-g0c1025f torch 1.10.0+cu111 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)
Setup complete ✅ (8 CPUs, 51.0 GB RAM, 46.1/166.8 GB disk)

Benchmarks complete (433.63s)
                   Format  mAP@0.5:0.95  Inference time (ms)
0                 PyTorch      0.462296             9.159939
1             TorchScript      0.462296             6.607546
2                    ONNX      0.462296            12.698026
3                OpenVINO           NaN                  NaN
4                TensorRT      0.462280             1.725197
5                  CoreML           NaN                  NaN
6   TensorFlow SavedModel      0.462296            20.273019
7     TensorFlow GraphDef      0.462296            20.212173
8         TensorFlow Lite           NaN                  NaN
9     TensorFlow Edge TPU           NaN                  NaN
10          TensorFlow.js           NaN                  NaN

Note: TensorRT exports are fixed at FP16.

MrRace · 2022-04-12T10:20:08Z

@glenn-jocher Thanks for your reply and I will try with your guide

MrRace · 2022-04-12T11:00:11Z

@MrRace ONNX export inference is working correctly with comparable speeds to PyTorch, i.e. see #6963
Run YOLOv5 benchmarks (speed and accuracy) for all supported export formats. This PR adds GPU benchmarking capability following CPU benchmarking PR #6613.

Format
export.py --include
Model

PyTorch

yolov5s.pt

TorchScript
torchscript
yolov5s.torchscript

ONNX
onnx
yolov5s.onnx

OpenVINO
openvino
yolov5s_openvino_model/

TensorRT
engine
yolov5s.engine

CoreML
coreml
yolov5s.mlmodel

TensorFlow SavedModel
saved_model
yolov5s_saved_model/

TensorFlow GraphDef
pb
yolov5s.pb

TensorFlow Lite
tflite
yolov5s.tflite

TensorFlow Edge TPU
edgetpu
yolov5s_edgetpu.tflite

TensorFlow.js
tfjs
yolov5s_web_model/

Usage:
git clone https://github.com/ultralytics/yolov5 -b update/bench_gpu  # clone
cd yolov5
pip install -qr requirements.txt coremltools onnx onnxruntime-gpu openvino-dev  # install
pip install -U nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com  # TensorRT

python utils/benchmarks.py --weights yolov5s.pt --img 640 --device 0
Colab++ V100 High-RAM Results
benchmarks: weights=/content/yolov5/yolov5s.pt, imgsz=640, batch_size=1, data=/content/yolov5/data/coco128.yaml, device=0, half=False
Checking setup...
YOLOv5 🚀 v6.1-48-g0c1025f torch 1.10.0+cu111 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)
Setup complete ✅ (8 CPUs, 51.0 GB RAM, 46.1/166.8 GB disk)

Benchmarks complete (433.63s)
                   Format  mAP@0.5:0.95  Inference time (ms)
0                 PyTorch      0.462296             9.159939
1             TorchScript      0.462296             6.607546
2                    ONNX      0.462296            12.698026
3                OpenVINO           NaN                  NaN
4                TensorRT      0.462280             1.725197
5                  CoreML           NaN                  NaN
6   TensorFlow SavedModel      0.462296            20.273019
7     TensorFlow GraphDef      0.462296            20.212173
8         TensorFlow Lite           NaN                  NaN
9     TensorFlow Edge TPU           NaN                  NaN
10          TensorFlow.js           NaN                  NaN
Note: TensorRT exports are fixed at FP16.

@glenn-jocher
When run git clone https://github.com/ultralytics/yolov5 -b update/bench_gpu:

Cloning into 'yolov5'...
fatal: Remote branch update/bench_gpu not found in upstream origin
Unexpected end of command stream

glenn-jocher · 2022-04-12T11:02:13Z

@MrRace the aforementioned PR is already merged, all of this is in master. If you already have YOLOv5 you don't need to do anything except install dependencies and run the benchmarks:

pip install -qr requirements.txt coremltools onnx onnxruntime-gpu openvino-dev  # install
pip install -U nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com  # TensorRT

python utils/benchmarks.py --weights yolov5s.pt --img 640 --device 0

MrRace · 2022-04-12T11:45:27Z

I use the master, the result：

Checking setup...
YOLOv5 🚀 v6.1-124-g8c420c4 torch 1.9.1+cu102 CUDA:0 (Tesla T4, 15110MiB)
Setup complete ✅ (40 CPUs, 156.6 GB RAM, 881.3/984.2 GB disk)

Benchmarks complete (445.65s)
                   Format  mAP@0.5:0.95  Inference time (ms)
0                 PyTorch        0.4623                 7.54
1             TorchScript        0.4623                 7.47
2                    ONNX        0.4623                14.99
3                OpenVINO           NaN                  NaN
4                TensorRT        0.4620                 2.96
5                  CoreML           NaN                  NaN
6   TensorFlow SavedModel           NaN                  NaN
7     TensorFlow GraphDef           NaN                  NaN
8         TensorFlow Lite           NaN                  NaN
9     TensorFlow Edge TPU           NaN                  NaN
10          TensorFlow.js           NaN                  NaN

@glenn-jocher Onnx also slower than Pytorch obviously. Therefore could you share the docker environment？

glenn-jocher · 2022-04-12T11:59:42Z

@MrRace your results look fine. What makes you think that ONNX should be faster than PyTorch?

glenn-jocher · 2022-04-12T12:02:28Z

@MrRace my previous results are labelled as 'Colab V100 High-RAM Results', but in any case Docker image is also readily available in the README in the Environments section: https://github.com/ultralytics/yolov5#environments

MrRace · 2022-04-12T12:04:01Z

@MrRace your results look fine. What makes you think that ONNX should be faster than PyTorch?

@glenn-jocher Thanks for your prompt reply ~

OnnxRuntime provide some graph optimization.
As my previous experience, convert Pytorch model to onnx can accelerate the inference normally

glenn-jocher · 2022-04-12T12:12:09Z

I've never seen ONNX speedup on GPU for YOLOv5. If you manage any speed improvements though feel free to submit a PR.

Please see our ✅ Contributing Guide to get started.

glenn-jocher · 2022-04-12T13:11:14Z

@MrRace latest GPU results on our server below. I also exported ONNX at --half but saw no speedup compared to FP32.

benchmarks: weights=/usr/src/app/yolov5s.pt, imgsz=640, batch_size=1, data=/usr/src/app/data/coco128.yaml, device=, half=False, test=False
Checking setup...
YOLOv5 🚀 v6.1-129-g74aaab3 torch 1.11.0+cu113 CUDA:0 (A100-SXM-80GB, 81251MiB)
Setup complete ✅ (96 CPUs, 1007.7 GB RAM, 1925.3/3519.3 GB disk)

Benchmarks complete (536.24s)
                   Format  mAP@0.5:0.95  Inference time (ms)
0                 PyTorch        0.4624                 6.45
1             TorchScript        0.4624                 4.57
2                    ONNX        0.4623                 6.90
3                OpenVINO           NaN                  NaN
4                TensorRT        0.4618                 1.17
5                  CoreML           NaN                  NaN
6   TensorFlow SavedModel        0.4623                17.72
7     TensorFlow GraphDef        0.4623                18.26
8         TensorFlow Lite           NaN                  NaN
9     TensorFlow Edge TPU           NaN                  NaN
10          TensorFlow.js           NaN                  NaN

MrRace · 2022-04-13T09:41:33Z

@glenn-jocher I also try the half version with --half:python utils/benchmarks.py --weights models/yolov5s.pt --img 640 --device 0 --half, the result:

benchmarks: weights=models/yolov5s.pt, imgsz=640, batch_size=1, data=/usr/src/app/data/coco128.yaml, device=0, half=True, test=False
Checking setup...
YOLOv5 🚀 v6.1-124-g8c420c4 torch 1.9.1+cu102 CUDA:0 (Tesla T4, 15110MiB)
Setup complete ✅ (40 CPUs, 156.6 GB RAM, 788.4/984.2 GB disk)

Benchmarks complete (427.96s)
                   Format  mAP@0.5:0.95  Inference time (ms)
0                 PyTorch        0.4622                 6.32
1             TorchScript        0.4622                 5.57
2                    ONNX        0.4596                11.38
3                OpenVINO           NaN                  NaN
4                TensorRT        0.4599                 2.71
5                  CoreML           NaN                  NaN
6   TensorFlow SavedModel           NaN                  NaN
7     TensorFlow GraphDef           NaN                  NaN
8         TensorFlow Lite           NaN                  NaN
9     TensorFlow Edge TPU           NaN                  NaN
10          TensorFlow.js           NaN                  NaN

The fp16 TensorRT version just same with float32 TensorRT version above. I check the code and find in export.py

       if builder.platform_has_fast_fp16:
            config.set_flag(trt.BuilderFlag.FP16)

Even I do not set --half the builder.platform_has_fast_fp16 is always True, which means the engine is always fp16. In the other word, for tensorRT,python utils/benchmarks.py --weights yolov5s.pt --img 640 --device 0 and python utils/benchmarks.py --weights yolov5s.pt --img 640 --device 0 --half will both get half version. Your fp32 version and fp16 version of tensorRT seems also have the same problem？

glenn-jocher · 2022-04-13T11:18:26Z

@MrRace yes that's correct! TRT is pinned to FP16 as we saw no observable benefit to FP32 TRT exports.

MrRace · 2022-04-18T08:52:06Z

@glenn-jocher I use same model and test data for detect.py and val.py, and the inference time of detect.py and val.py are significantly different. For example,

detect.py inference message:

Speed: 0.4ms pre-process, 12.4ms inference, 0.9ms NMS per image at shape (1, 3, 640, 640)

val.py inference message:

Speed: 0.2ms pre-process, 7.6ms inference, 1.0ms NMS per image at shape (1, 3, 640, 640)

Have you ever encountered this problem @glenn-jocher ？

glenn-jocher · 2022-04-19T22:11:04Z

@MrRace 👋 hi, thanks for letting us know about this possible problem with YOLOv5 🚀. We've created a few short guidelines below to help users provide what we need in order to start investigating a possible problem.

How to create a Minimal, Reproducible Example

When asking a question, people will be better able to provide help if you provide code that they can easily understand and use to reproduce the problem. This is referred to by community members as creating a minimum reproducible example. Your code that reproduces the problem should be:

✅ Minimal – Use as little code as possible to produce the problem
✅ Complete – Provide all parts someone else needs to reproduce the problem
✅ Reproducible – Test the code you're about to provide to make sure it reproduces the problem

For Ultralytics to provide assistance your code should also be:

✅ Current – Verify that your code is up-to-date with GitHub master, and if necessary git pull or git clone a new copy to ensure your problem has not already been solved in master.
✅ Unmodified – Your problem must be reproducible using official YOLOv5 code without changes. Ultralytics does not provide support for custom code ⚠️.

If you believe your problem meets all the above criteria, please close this issue and raise a new one using the 🐛 Bug Report template with a minimum reproducible example to help us better understand and diagnose your problem.

Thank you! 😃

armap94 · 2022-09-28T12:17:10Z

Running the model on Colab with a P100 GPU, I have the following results:

benchmarks: weights=yolov5s.pt, imgsz=640, batch_size=1, data=/content/yolov5/data/coco128.yaml, device=0, half=False, test=False, pt_only=False, hard_fail=False
Checking setup...
YOLOv5 🚀 v6.2-165-g966b0e0 Python-3.7.14 torch-1.12.1+cu113 CUDA:0 (Tesla P100-PCIE-16GB, 16281MiB)
Setup complete ✅ (4 CPUs, 25.5 GB RAM, 44.2/166.8 GB disk)

Benchmarks complete (258.79s)
                   Format  Size (MB)  mAP50-95  Inference time (ms)
0                 PyTorch       14.1    0.4716                 6.36
1             TorchScript       28.1    0.4716                 6.12
2                    ONNX       28.0    0.4716                15.00
3                OpenVINO        NaN       NaN                  NaN
4                TensorRT       33.2    0.4716                 4.58
5                  CoreML        NaN       NaN                  NaN
6   TensorFlow SavedModel       27.8    0.4716                20.79
7     TensorFlow GraphDef       27.8    0.4716                20.80
8         TensorFlow Lite        NaN       NaN                  NaN
9     TensorFlow Edge TPU        NaN       NaN                  NaN
10          TensorFlow.js        NaN       NaN                  NaN
11           PaddlePaddle       57.0    0.4716               409.37

Which shows that in running benchmarks the ONNX, while slower is still comparable. However when running the detection script, the results of of inference with OpenCV DNN is dramatically slower.

!python detect.py --weights yolov5s.pt --img 640 --conf 0.25 --source data/images  
Speed: 0.5ms pre-process, 15.3ms inference, 1.8ms NMS per image at shape (1, 3, 640, 640)

!python detect.py --weights yolov5s.onnx --img 640 --conf 0.25 --source data/images  
Speed: 2.5ms pre-process, 15.6ms inference, 2.8ms NMS per image at shape (1, 3, 640, 640)

!python detect.py --weights yolov5s.onnx --img 640 --conf 0.25 --source data/images  --dnn
Speed: 2.9ms pre-process, 749.4ms inference, 2.6ms NMS per image at shape (1, 3, 640, 640)

@glenn-jocher Is the speed of OpenCV dnn inference the expected normal speed? Is there any way to improve it?

glenn-jocher · 2022-09-28T13:22:05Z

@armap94 --dnn inference is likely using CPU. I'm not very familiar with DNN, but if you'd like to submit a PR for DNN inference that would be useful. The relevant code area is here:

Loading:

yolov5/models/common.py

Lines 355 to 358 in 2373d54

    
           elif dnn:  # ONNX OpenCV DNN 
        
               LOGGER.info(f'Loading {w} for ONNX OpenCV DNN inference...') 
        
               check_requirements('opencv-python>=4.5.4') 
        
               net = cv2.dnn.readNetFromONNX(w)

Inference:

yolov5/models/common.py

Lines 507 to 510 in 2373d54

    
           elif self.dnn:  # ONNX OpenCV DNN 
        
               im = im.cpu().numpy()  # torch to numpy 
        
               self.net.setInput(im) 
        
               y = self.net.forward()

armap94 · 2022-09-29T13:13:03Z

@armap94 --dnn inference is likely using CPU. I'm not very familiar with DNN, but if you'd like to submit a PR for DNN inference that would be useful. The relevant code area is here:

Loading:

yolov5/models/common.py

Lines 355 to 358 in 2373d54

elif dnn: # ONNX OpenCV DNN

LOGGER.info(f'Loading {w} for ONNX OpenCV DNN inference...')

check_requirements('opencv-python>=4.5.4')

net = cv2.dnn.readNetFromONNX(w)

Inference:

yolov5/models/common.py

Lines 507 to 510 in 2373d54

elif self.dnn: # ONNX OpenCV DNN

im = im.cpu().numpy() # torch to numpy

self.net.setInput(im)

y = self.net.forward()

But when the .pt file is converted to .onnx using export.py, if the flag --device 0 is used, doesn't that force the ONNX to use GPU during inference? Or are extra steps required to ensure that ONNX is using GPU during inference?

glenn-jocher · 2022-09-29T21:04:37Z

@armap94 inference device is independent of export device.

shrijan00 added the question Further information is requested label Sep 15, 2021

glenn-jocher mentioned this issue Oct 7, 2021

Check 'onnxruntime-gpu' if torch.has_cuda #5087

Merged

glenn-jocher linked a pull request Oct 7, 2021 that will close this issue

Check 'onnxruntime-gpu' if torch.has_cuda #5087

Merged

jebastin-nadar mentioned this issue Oct 11, 2021

Refactor Detect() anchors for ONNX <> OpenCV DNN compatibility #4833

Merged

glenn-jocher closed this as completed in #5087 Oct 13, 2021

MrRace mentioned this issue Apr 13, 2022

[question] yolov5-onnx-float16 not improve on GPU microsoft/onnxruntime#11151

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNX Inference Speed extremely slow compare to .pt Model #4808

ONNX Inference Speed extremely slow compare to .pt Model #4808

shrijan00 commented Sep 15, 2021

glenn-jocher commented Sep 15, 2021

shrijan00 commented Sep 15, 2021

glenn-jocher commented Sep 15, 2021

zhaojun060708 commented Sep 17, 2021

glenn-jocher commented Sep 17, 2021

happyday-lkj commented Sep 27, 2021

callbarian commented Oct 7, 2021

glenn-jocher commented Oct 7, 2021

glenn-jocher commented Oct 7, 2021 •

edited

Loading

glenn-jocher commented Oct 10, 2021

MrRace commented Apr 12, 2022 •

edited

Loading

MrRace commented Apr 12, 2022

glenn-jocher commented Apr 12, 2022

Colab++ V100 High-RAM Results

MrRace commented Apr 12, 2022

MrRace commented Apr 12, 2022 •

edited

Loading

PyTorch

Colab++ V100 High-RAM Results

glenn-jocher commented Apr 12, 2022

MrRace commented Apr 12, 2022 •

edited

Loading

glenn-jocher commented Apr 12, 2022

glenn-jocher commented Apr 12, 2022

MrRace commented Apr 12, 2022 •

edited

Loading

glenn-jocher commented Apr 12, 2022 •

edited

Loading

glenn-jocher commented Apr 12, 2022

MrRace commented Apr 13, 2022 •

edited

Loading

glenn-jocher commented Apr 13, 2022

MrRace commented Apr 18, 2022

glenn-jocher commented Apr 19, 2022 •

edited

Loading

armap94 commented Sep 28, 2022 •

edited

Loading

glenn-jocher commented Sep 28, 2022

armap94 commented Sep 29, 2022

glenn-jocher commented Sep 29, 2022

ONNX Inference Speed extremely slow compare to .pt Model #4808

ONNX Inference Speed extremely slow compare to .pt Model #4808

Comments

shrijan00 commented Sep 15, 2021

glenn-jocher commented Sep 15, 2021

shrijan00 commented Sep 15, 2021

glenn-jocher commented Sep 15, 2021

zhaojun060708 commented Sep 17, 2021

glenn-jocher commented Sep 17, 2021

happyday-lkj commented Sep 27, 2021

callbarian commented Oct 7, 2021

glenn-jocher commented Oct 7, 2021

glenn-jocher commented Oct 7, 2021 • edited Loading

glenn-jocher commented Oct 10, 2021

MrRace commented Apr 12, 2022 • edited Loading

MrRace commented Apr 12, 2022

glenn-jocher commented Apr 12, 2022

Colab++ V100 High-RAM Results

MrRace commented Apr 12, 2022

MrRace commented Apr 12, 2022 • edited Loading

PyTorch

Colab++ V100 High-RAM Results

glenn-jocher commented Apr 12, 2022

MrRace commented Apr 12, 2022 • edited Loading

glenn-jocher commented Apr 12, 2022

glenn-jocher commented Apr 12, 2022

MrRace commented Apr 12, 2022 • edited Loading

glenn-jocher commented Apr 12, 2022 • edited Loading

glenn-jocher commented Apr 12, 2022

MrRace commented Apr 13, 2022 • edited Loading

glenn-jocher commented Apr 13, 2022

MrRace commented Apr 18, 2022

glenn-jocher commented Apr 19, 2022 • edited Loading

How to create a Minimal, Reproducible Example

armap94 commented Sep 28, 2022 • edited Loading

glenn-jocher commented Sep 28, 2022

armap94 commented Sep 29, 2022

glenn-jocher commented Sep 29, 2022

glenn-jocher commented Oct 7, 2021 •

edited

Loading

MrRace commented Apr 12, 2022 •

edited

Loading

MrRace commented Apr 12, 2022 •

edited

Loading

MrRace commented Apr 12, 2022 •

edited

Loading

MrRace commented Apr 12, 2022 •

edited

Loading

glenn-jocher commented Apr 12, 2022 •

edited

Loading

MrRace commented Apr 13, 2022 •

edited

Loading

glenn-jocher commented Apr 19, 2022 •

edited

Loading

armap94 commented Sep 28, 2022 •

edited

Loading