Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Build] Cuda Failure 716:misaligned address when building onnxruntime with Cuda #15981

Closed
ninjatall12 opened this issue May 17, 2023 · 12 comments
Assignees
Labels
build build issues; typically submitted using template ep:CUDA issues related to the CUDA execution provider platform:windows issues related to the Windows platform

Comments

@ninjatall12
Copy link

ninjatall12 commented May 17, 2023

Describe the issue

I try to build Onnxruntime with Cuda 11.8, the binaries for cudnn are placed inside the 11.8 folder so cudnn is not an issue. I have tried changing the cudnn version and checked the cuda version and it is compatible with my GPU and Onnxruntime but i seem to get this issue. My GPU is 3060ti for anyone wondering and i am on the latest drivers.

Urgency

No response

Target platform

Windows 11

Build script

build.bat --config Release --use_cuda --cuda_version 11.8 --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8" --cudnn_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8"

Error / output

1: [ FAILED ] MLOpTest.TreeRegressorMultiTargetBatchTreeE2 (0 ms)
1: [ RUN ] MLOpTest.TreeRegressorMultiTargetAverage
1: D:\onnxruntime\onnxruntime\core\providers\cuda\cuda_call.cc:121 onnxruntime::CudaCall D:\onnxruntime\onnxruntime\core\providers\cuda\cuda_call.cc:114 onnxruntime::CudaCall CUDA failure 716: misaligned address ; GPU=0 ; hostname=WIN-QHBHHD67V51 ; file=D:\onnxruntime\onnxruntime\core\providers\cuda\cuda_execution_provider.cc ; line=241 ; expr=cudaDeviceSynchronize();
1:
1:
1: Provider:CUDAExecutionProvider
1: unknown file: error: C++ exception with description "D:\onnxruntime\onnxruntime\core\providers\cuda\cuda_call.cc:121 onnxruntime::CudaCall D:\onnxruntime\onnxruntime\core\providers\cuda\cuda_call.cc:114 onnxruntime::CudaCall CUDA failure 716: misaligned address ; GPU=0 ; hostname=WIN-QHBHHD67V51 ; file=D:\onnxruntime\onnxruntime\core\providers\cuda\cuda_execution_provider.cc ; line=241 ; expr=cudaDeviceSynchronize();
1:

The following tests FAILED:
1 - onnxruntime_test_all (Failed)
Errors while running CTest
Output from these tests are in: D:/onnxruntime/build/Windows/Release/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
Traceback (most recent call last):
File "D:\onnxruntime\tools\ci_build\build.py", line 2601, in
sys.exit(main())
File "D:\onnxruntime\tools\ci_build\build.py", line 2504, in main
run_onnxruntime_tests(args, source_dir, ctest_path, build_dir, configs)
File "D:\onnxruntime\tools\ci_build\build.py", line 1744, in run_onnxruntime_tests
run_subprocess(ctest_cmd, cwd=cwd, dll_path=dll_path)
File "D:\onnxruntime\tools\ci_build\build.py", line 780, in run_subprocess
return run(*args, cwd=cwd, capture_stdout=capture_stdout, shell=shell, env=my_env)
File "D:\onnxruntime\tools\python\util\run.py", line 49, in run
completed_process = subprocess.run(
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['C:\Program Files\CMake\bin\ctest.EXE', '--build-config', 'Release', '--verbose', '--timeout', '10800']' returned non-zero exit status 8.

Visual Studio Version

Visual Studio 2022

GCC / Compiler Version

No response

@ninjatall12 ninjatall12 added the build build issues; typically submitted using template label May 17, 2023
@github-actions github-actions bot added ep:CUDA issues related to the CUDA execution provider platform:windows issues related to the Windows platform labels May 17, 2023
@snnn
Copy link
Member

snnn commented May 17, 2023

I observed the same error too, on an A10 machine with CUDA 11.6 and VS 2019.

@snnn
Copy link
Member

snnn commented May 17, 2023

Were you building the code from the main branch?

@ninjatall12
Copy link
Author

@snnn I assume that release build 1.14.1 does not have this problem? and yes, I am building code from the main branch.

@snnn
Copy link
Member

snnn commented May 17, 2023

I just noticed it last month, haven't find the root cause yet. It happens on some hardware with some GPU driver versions.

@ninjatall12
Copy link
Author

@snnn I am on 531.79 with 3060ti, with Cuda 11.8 toolkit and Cudnn version 8.9.0. I have included below Dxdiag log although it might be useless.
DxDiag.txt

@snnn
Copy link
Member

snnn commented May 17, 2023

I talked to @souptc offline. He will take a look when he finishes his current work on hand.

@snnn
Copy link
Member

snnn commented May 17, 2023

Full log:
36.zip

@satyajandhyala
Copy link
Contributor

@ninjatall12 I looked into this error on A10. Please check your environment.

  1. If you have multiple versions of CUDA sunch as 11.6 and 11.8, make sure that the environmental variables CUDA_HOME, CUDA_PATH, etc. point to the same version.
  2. Your PATH points to the correct nvcc executable
  3. Check CUDA and cuDNN version compatibility here

@ninjatall12
Copy link
Author

  1. Only have one version of Cuda installed
  2. correct path points
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2022 NVIDIA Corporation
    Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022
    Cuda compilation tools, release 11.8, V11.8.89
    Build cuda_11.8.r11.8/compiler.31833905_0
  3. checked beforehand and still compatible.

@satyajandhyala

@snnn
Copy link
Member

snnn commented May 26, 2023

I found the test that was causing problem is FusedMatMulOpTest.FloatTypeTransposeBatch

./onnxruntime_test_all  --gtest_filter=FusedMatMulOpTest.FloatTypeTransposeBatch

It was added in PR #9734 .
ONNX Runtime v1.10.0 version is fine. The version doesn't have the PR.

@snnn
Copy link
Member

snnn commented May 27, 2023

I think I found the root cause. It's because of the CublasMathModeSetter
class: https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cuda/cuda_common.h#L65

Our team's build service doesn't have access to A-series GPUs due to GPU shortage. We only tested it on T4 and M60 GPUs.

@snnn
Copy link
Member

snnn commented Jun 17, 2023

Fixed in ONNX Runtime 1.15.1 release.

@snnn snnn closed this as completed Jun 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build build issues; typically submitted using template ep:CUDA issues related to the CUDA execution provider platform:windows issues related to the Windows platform
Projects
None yet
Development

No branches or pull requests

4 participants