Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA driver version mismatched with CUDA runtime version #343

Closed
loligans opened this issue May 14, 2024 · 1 comment · May be fixed by #345
Closed

CUDA driver version mismatched with CUDA runtime version #343

loligans opened this issue May 14, 2024 · 1 comment · May be fixed by #345

Comments

@loligans
Copy link

loligans commented May 14, 2024

The GPU Driver is using CUDA 12.2 but the CUDA runtime installed (nvcc) is 12.4

nvidia-smi

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08             Driver Version: 535.161.08   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA H100 80GB HBM3          On  | 00000001:00:00.0 Off |                    0 |
| N/A   28C    P0              76W / 700W |      0MiB / 81559MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

The mismatch of CUDA versions causes GPU_Burn to hang. I believe the GPU driver should be updated to 550.54.15

If the intended CUDA version is 12.2 then the GPU driver can remain as 535.161.08, but the CUDA runtime should be downgraded to 12.2

If the intended CUDA version is 12.4 then the GPU driver should be updated to 550.54.15

Related issue: wilicc/gpu-burn#7

@LiquidPT
Copy link
Contributor

There were issues with Fabric Manager 550.54.15, so we had to revert FM and the GPU driver. As per NVIDIA, this version of CUDA should be compatible with the GPU driver:

https://docs.nvidia.com/deploy/cuda-compatibility/index.html#minor-version-comaptibility

CUDA 12.4 has some critical fixes, so using the newer version is preferable.

@LiquidPT LiquidPT closed this as completed Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants