[Build] Cuda Failure 716:misaligned address when building onnxruntime with Cuda #15981

ninjatall12 · 2023-05-17T13:21:15Z

Describe the issue

I try to build Onnxruntime with Cuda 11.8, the binaries for cudnn are placed inside the 11.8 folder so cudnn is not an issue. I have tried changing the cudnn version and checked the cuda version and it is compatible with my GPU and Onnxruntime but i seem to get this issue. My GPU is 3060ti for anyone wondering and i am on the latest drivers.

Urgency

No response

Target platform

Windows 11

Build script

build.bat --config Release --use_cuda --cuda_version 11.8 --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8" --cudnn_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8"

Error / output

1: [ FAILED ] MLOpTest.TreeRegressorMultiTargetBatchTreeE2 (0 ms)
1: [ RUN ] MLOpTest.TreeRegressorMultiTargetAverage
1: D:\onnxruntime\onnxruntime\core\providers\cuda\cuda_call.cc:121 onnxruntime::CudaCall D:\onnxruntime\onnxruntime\core\providers\cuda\cuda_call.cc:114 onnxruntime::CudaCall CUDA failure 716: misaligned address ; GPU=0 ; hostname=WIN-QHBHHD67V51 ; file=D:\onnxruntime\onnxruntime\core\providers\cuda\cuda_execution_provider.cc ; line=241 ; expr=cudaDeviceSynchronize();
1:
1:
1: Provider:CUDAExecutionProvider
1: unknown file: error: C++ exception with description "D:\onnxruntime\onnxruntime\core\providers\cuda\cuda_call.cc:121 onnxruntime::CudaCall D:\onnxruntime\onnxruntime\core\providers\cuda\cuda_call.cc:114 onnxruntime::CudaCall CUDA failure 716: misaligned address ; GPU=0 ; hostname=WIN-QHBHHD67V51 ; file=D:\onnxruntime\onnxruntime\core\providers\cuda\cuda_execution_provider.cc ; line=241 ; expr=cudaDeviceSynchronize();
1:

The following tests FAILED:
1 - onnxruntime_test_all (Failed)
Errors while running CTest
Output from these tests are in: D:/onnxruntime/build/Windows/Release/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
Traceback (most recent call last):
File "D:\onnxruntime\tools\ci_build\build.py", line 2601, in
sys.exit(main())
File "D:\onnxruntime\tools\ci_build\build.py", line 2504, in main
run_onnxruntime_tests(args, source_dir, ctest_path, build_dir, configs)
File "D:\onnxruntime\tools\ci_build\build.py", line 1744, in run_onnxruntime_tests
run_subprocess(ctest_cmd, cwd=cwd, dll_path=dll_path)
File "D:\onnxruntime\tools\ci_build\build.py", line 780, in run_subprocess
return run(*args, cwd=cwd, capture_stdout=capture_stdout, shell=shell, env=my_env)
File "D:\onnxruntime\tools\python\util\run.py", line 49, in run
completed_process = subprocess.run(
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['C:\Program Files\CMake\bin\ctest.EXE', '--build-config', 'Release', '--verbose', '--timeout', '10800']' returned non-zero exit status 8.

Visual Studio Version

Visual Studio 2022

GCC / Compiler Version

No response

snnn · 2023-05-17T17:09:13Z

I observed the same error too, on an A10 machine with CUDA 11.6 and VS 2019.

snnn · 2023-05-17T17:09:50Z

Were you building the code from the main branch?

ninjatall12 · 2023-05-17T19:26:49Z

@snnn I assume that release build 1.14.1 does not have this problem? and yes, I am building code from the main branch.

snnn · 2023-05-17T19:35:35Z

I just noticed it last month, haven't find the root cause yet. It happens on some hardware with some GPU driver versions.

ninjatall12 · 2023-05-17T19:42:01Z

@snnn I am on 531.79 with 3060ti, with Cuda 11.8 toolkit and Cudnn version 8.9.0. I have included below Dxdiag log although it might be useless.
DxDiag.txt

snnn · 2023-05-17T20:21:18Z

I talked to @souptc offline. He will take a look when he finishes his current work on hand.

snnn · 2023-05-17T20:44:42Z

Full log:
36.zip

satyajandhyala · 2023-05-18T06:11:54Z

@ninjatall12 I looked into this error on A10. Please check your environment.

If you have multiple versions of CUDA sunch as 11.6 and 11.8, make sure that the environmental variables CUDA_HOME, CUDA_PATH, etc. point to the same version.
Your PATH points to the correct nvcc executable
Check CUDA and cuDNN version compatibility here

ninjatall12 · 2023-05-18T06:28:04Z

Only have one version of Cuda installed

correct path points
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

checked beforehand and still compatible.

@satyajandhyala

snnn · 2023-05-26T22:57:09Z

I found the test that was causing problem is FusedMatMulOpTest.FloatTypeTransposeBatch

./onnxruntime_test_all  --gtest_filter=FusedMatMulOpTest.FloatTypeTransposeBatch

It was added in PR #9734 .
ONNX Runtime v1.10.0 version is fine. The version doesn't have the PR.

snnn · 2023-05-27T02:07:20Z

I think I found the root cause. It's because of the CublasMathModeSetter
class: https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cuda/cuda_common.h#L65

Our team's build service doesn't have access to A-series GPUs due to GPU shortage. We only tested it on T4 and M60 GPUs.

snnn · 2023-06-17T01:13:54Z

Fixed in ONNX Runtime 1.15.1 release.

ninjatall12 added the build build issues; typically submitted using template label May 17, 2023

github-actions bot added ep:CUDA issues related to the CUDA execution provider platform:windows issues related to the Windows platform labels May 17, 2023

snnn assigned souptc May 17, 2023

snnn mentioned this issue May 18, 2023

Move Windows GPU pipelines to A10 #15642

Closed

snnn mentioned this issue May 27, 2023

Fix a misaligned error in CUDA GEMM #16130

Merged

snnn closed this as completed Jun 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Build] Cuda Failure 716:misaligned address when building onnxruntime with Cuda #15981

[Build] Cuda Failure 716:misaligned address when building onnxruntime with Cuda #15981

ninjatall12 commented May 17, 2023 •

edited

Loading

snnn commented May 17, 2023

snnn commented May 17, 2023

ninjatall12 commented May 17, 2023

snnn commented May 17, 2023

ninjatall12 commented May 17, 2023

snnn commented May 17, 2023

snnn commented May 17, 2023

satyajandhyala commented May 18, 2023

ninjatall12 commented May 18, 2023

snnn commented May 26, 2023

snnn commented May 27, 2023

snnn commented Jun 17, 2023

[Build] Cuda Failure 716:misaligned address when building onnxruntime with Cuda #15981

[Build] Cuda Failure 716:misaligned address when building onnxruntime with Cuda #15981

Comments

ninjatall12 commented May 17, 2023 • edited Loading

Describe the issue

Urgency

Target platform

Build script

Error / output

Visual Studio Version

GCC / Compiler Version

snnn commented May 17, 2023

snnn commented May 17, 2023

ninjatall12 commented May 17, 2023

snnn commented May 17, 2023

ninjatall12 commented May 17, 2023

snnn commented May 17, 2023

snnn commented May 17, 2023

satyajandhyala commented May 18, 2023

ninjatall12 commented May 18, 2023

snnn commented May 26, 2023

snnn commented May 27, 2023

snnn commented Jun 17, 2023

ninjatall12 commented May 17, 2023 •

edited

Loading