Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix for batched gemv #2481

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Bugfix for batched gemv #2481

wants to merge 5 commits into from

Conversation

kose-y
Copy link

@kose-y kose-y commented Aug 28, 2024

Fix incorrect definition of m and n in gemv_strided_batched!

Fix incorrect definition of m and n in gemv_strided_batched!
@maleadt
Copy link
Member

maleadt commented Aug 28, 2024

Shouldn't m and n switch depending on trans, just like with other wrappers?

m = size(A[1], trans == 'N' ? 1 : 2)
n = size(A[1], trans == 'N' ? 2 : 1)
lda = max(1,stride(A[1],2))
incx = stride(x[1],1)
incy = stride(y[1],1)
Aptrs = unsafe_batch(A)
xptrs = unsafe_batch(x)
yptrs = unsafe_batch(y)
if CUBLAS.version() >= v"12.0"
$fname_64(handle(), trans, m, n, alpha, Aptrs, lda, xptrs, incx, beta, yptrs, incy, length(A))
else
$fname(handle(), trans, m, n, alpha, Aptrs, lda, xptrs, incx, beta, yptrs, incy, length(A))
end

Can you add a test that covers the case that doesn't work right now, and works after the change?

@maleadt maleadt added needs tests Tests are requested. bugfix This gets something working again. labels Aug 28, 2024
@kose-y
Copy link
Author

kose-y commented Aug 28, 2024

No, according to the official cuBLAS documentation, definitions of m and n for gemm and gemv interfaces are different.

For gemm interfaces:

  • m: number of rows of matrix op(A) and C.
  • n: number of columns of matrix op(B) and C.

For gemv:

  • m: number of rows of matrix A.
  • n: number of columns of matrix A.

For gemv, they don't depend on op.

@kose-y
Copy link
Author

kose-y commented Aug 28, 2024

I will try to add some tests this week.

@kose-y
Copy link
Author

kose-y commented Aug 28, 2024

See also the gemv! function:

m,n = size(A)
# check dimensions
length(x) == (trans == 'N' ? n : m) && length(y) == (trans == 'N' ? m : n) || throw(DimensionMismatch(""))
# compute increments
lda = max(1,stride(A,2))
incx = stride(x,1)
incy = stride(y,1)

@kose-y kose-y changed the title Bugfix for gemv_strided_batched! Bugfix for batched gemv Aug 29, 2024
@kose-y
Copy link
Author

kose-y commented Aug 29, 2024

@maleadt A similar bug was found on gemv_batched!, and it's also fixed. Tests have been added now.

@maleadt
Copy link
Member

maleadt commented Aug 29, 2024

LGTM, let's just ping the original author of these functions: @lpawela

@kose-y
Copy link
Author

kose-y commented Sep 9, 2024

@maleadt What is the status of this PR?

@lpawela
Copy link
Contributor

lpawela commented Sep 9, 2024

It hangs on me, sorry. I'll have a look within a couple of days.

@lpawela
Copy link
Contributor

lpawela commented Sep 11, 2024

I have problems launching tests on this patch.

      From worker 2:    Stacktrace:
      From worker 2:      [1] throw_api_error(res::CUDA.cudaError_enum)
      From worker 2:        @ CUDA ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:30
      From worker 2:      [2] check
      From worker 2:        @ ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:37 [inlined]
      From worker 2:      [3] cuMemFreeAsync
      From worker 2:        @ ~/lib/CUDA.jl/lib/utils/call.jl:34 [inlined]
      From worker 2:      [4] free(mem::CUDA.DeviceMemory; stream::CuStream)
      From worker 2:        @ CUDA ~/lib/CUDA.jl/lib/cudadrv/memory.jl:87
      From worker 2:      [5] free
      From worker 2:        @ ~/lib/CUDA.jl/lib/cudadrv/memory.jl:82 [inlined]
      From worker 2:      [6] #1102
      From worker 2:        @ ~/lib/CUDA.jl/src/memory.jl:708 [inlined]
      From worker 2:      [7] #context!#990
      From worker 2:        @ ~/lib/CUDA.jl/lib/cudadrv/state.jl:168 [inlined]
      From worker 2:      [8] context!
      From worker 2:        @ ~/lib/CUDA.jl/lib/cudadrv/state.jl:163 [inlined]
      From worker 2:      [9] _pool_free
      From worker 2:        @ ~/lib/CUDA.jl/src/memory.jl:707 [inlined]
      From worker 2:     [10] macro expansion
      From worker 2:        @ ./timing.jl:395 [inlined]
      From worker 2:     [11] pool_free(managed::CUDA.Managed{CUDA.DeviceMemory})
      From worker 2:        @ CUDA ~/lib/CUDA.jl/src/memory.jl:689
      From worker 2:     [12] release(::GPUArrays.RefCounted{CUDA.Managed{CUDA.DeviceMemory}})
      From worker 2:        @ GPUArrays ~/.julia/packages/GPUArrays/qt4ax/src/host/abstractarray.jl:42
      From worker 2:     [13] unsafe_free!
      From worker 2:        @ ~/.julia/packages/GPUArrays/qt4ax/src/host/abstractarray.jl:91 [inlined]
      From worker 2:     [14] unsafe_free!(xs::CuArray{Float32, 2, CUDA.DeviceMemory})
      From worker 2:        @ CUDA ~/lib/CUDA.jl/src/array.jl:94
      From worker 2:     [15] exit
      From worker 2:        @ ./initdefs.jl:28 [inlined]
      From worker 2:     [16] exit()
      From worker 2:        @ Base ./initdefs.jl:29
      From worker 2:     [17] #invokelatest#2
      From worker 2:        @ ./essentials.jl:892 [inlined]
      From worker 2:     [18] invokelatest(::Any)
      From worker 2:        @ Base ./essentials.jl:889
      From worker 2:     [19] (::Distributed.var"#118#120"{Distributed.RemoteDoMsg})()
      From worker 2:        @ Distributed ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:310
      From worker 2:     [20] run_work_thunk(thunk::Distributed.var"#118#120"{Distributed.RemoteDoMsg}, print_error::Bool)
      From worker 2:        @ Distributed ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:70
      From worker 2:     [21] (::Distributed.var"#117#119"{Distributed.RemoteDoMsg})()
      From worker 2:        @ Distributed ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:310
      From worker 2:    WARNING: Error while freeing DeviceMemory(1.562 KiB at 0x0000000302122a00):
      From worker 2:    CUDA.CuError(code=CUDA.cudaError_enum(0x000002bc))

when launching julia --project test/runtests.jl libraries/cublas

julia> CUDA.versioninfo()
CUDA runtime 12.6, artifact installation
CUDA driver 12.4
NVIDIA driver 550.90.7

CUDA libraries: 
- CUBLAS: 12.6.0
- CURAND: 10.3.7
- CUFFT: 11.2.6
- CUSOLVER: 11.6.4
- CUSPARSE: 12.5.2
- CUPTI: 2024.3.0 (API 24.0.0)
- NVML: 12.0.0+550.90.7

Julia packages: 
- CUDA: 5.5.0
- CUDA_Driver_jll: 0.10.0+0
- CUDA_Runtime_jll: 0.15.1+0

Toolchain:
- Julia: 1.10.2
- LLVM: 15.0.7

1 device:
  0: NVIDIA GeForce RTX 3080 (sm_86, 7.857 GiB / 10.000 GiB available)

@maleadt
Copy link
Member

maleadt commented Sep 12, 2024

julia>  CUDA.CuError(CUDA.cudaError_enum(0x000002bc))
CuError(CUDA_ERROR_ILLEGAL_ADDRESS)

The changes in this PR seem to triggering some illegal memory access.

@maleadt
Copy link
Member

maleadt commented Sep 17, 2024

I'm seeing similar issues locally, but I'm having a hard time isolating the problem. Many times, the libraries/cublas test suite hangs on this PR, while only taking the gemv tests modified here doesn't reproduce the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This gets something working again. needs tests Tests are requested.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants