Bugfix for batched gemv #2481

kose-y · 2024-08-28T04:55:09Z

Fix incorrect definition of m and n in gemv_strided_batched!

maleadt · 2024-08-28T05:48:40Z

Shouldn't m and n switch depending on trans, just like with other wrappers?

Lines 426 to 438 in 8b54f85

    
           m = size(A[1], trans == 'N' ? 1 : 2) 
        
           n = size(A[1], trans == 'N' ? 2 : 1) 
        
           lda = max(1,stride(A[1],2)) 
        
           incx = stride(x[1],1) 
        
           incy = stride(y[1],1) 
        
           Aptrs = unsafe_batch(A) 
        
           xptrs = unsafe_batch(x) 
        
           yptrs = unsafe_batch(y) 
        
           if CUBLAS.version() >= v"12.0" 
        
               $fname_64(handle(), trans, m, n, alpha, Aptrs, lda, xptrs, incx, beta, yptrs, incy, length(A)) 
        
           else 
        
               $fname(handle(), trans, m, n, alpha, Aptrs, lda, xptrs, incx, beta, yptrs, incy, length(A)) 
        
           end

Can you add a test that covers the case that doesn't work right now, and works after the change?

kose-y · 2024-08-28T12:28:17Z

No, according to the official cuBLAS documentation, definitions of m and n for gemm and gemv interfaces are different.

For gemm interfaces:

m: number of rows of matrix op(A) and C.
n: number of columns of matrix op(B) and C.

For gemv:

m: number of rows of matrix A.
n: number of columns of matrix A.

For gemv, they don't depend on op.

kose-y · 2024-08-28T12:29:04Z

I will try to add some tests this week.

kose-y · 2024-08-28T12:36:39Z

See also the gemv! function:

CUDA.jl/lib/cublas/wrappers.jl

Lines 378 to 384 in bbe625b

    
           m,n = size(A) 
        
           # check dimensions 
        
           length(x) == (trans == 'N' ? n : m) && length(y) == (trans == 'N' ? m : n) || throw(DimensionMismatch("")) 
        
           # compute increments 
        
           lda = max(1,stride(A,2)) 
        
           incx = stride(x,1) 
        
           incy = stride(y,1)

all the input dimensions should be identical for gemv_batched!

kose-y · 2024-08-29T02:26:40Z

@maleadt A similar bug was found on gemv_batched!, and it's also fixed. Tests have been added now.

maleadt · 2024-08-29T12:48:38Z

LGTM, let's just ping the original author of these functions: @lpawela

kose-y · 2024-09-09T19:40:49Z

@maleadt What is the status of this PR?

lpawela · 2024-09-09T22:03:25Z

It hangs on me, sorry. I'll have a look within a couple of days.

lpawela · 2024-09-11T20:06:00Z

I have problems launching tests on this patch.

      From worker 2:    Stacktrace:
      From worker 2:      [1] throw_api_error(res::CUDA.cudaError_enum)
      From worker 2:        @ CUDA ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:30
      From worker 2:      [2] check
      From worker 2:        @ ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:37 [inlined]
      From worker 2:      [3] cuMemFreeAsync
      From worker 2:        @ ~/lib/CUDA.jl/lib/utils/call.jl:34 [inlined]
      From worker 2:      [4] free(mem::CUDA.DeviceMemory; stream::CuStream)
      From worker 2:        @ CUDA ~/lib/CUDA.jl/lib/cudadrv/memory.jl:87
      From worker 2:      [5] free
      From worker 2:        @ ~/lib/CUDA.jl/lib/cudadrv/memory.jl:82 [inlined]
      From worker 2:      [6] #1102
      From worker 2:        @ ~/lib/CUDA.jl/src/memory.jl:708 [inlined]
      From worker 2:      [7] #context!#990
      From worker 2:        @ ~/lib/CUDA.jl/lib/cudadrv/state.jl:168 [inlined]
      From worker 2:      [8] context!
      From worker 2:        @ ~/lib/CUDA.jl/lib/cudadrv/state.jl:163 [inlined]
      From worker 2:      [9] _pool_free
      From worker 2:        @ ~/lib/CUDA.jl/src/memory.jl:707 [inlined]
      From worker 2:     [10] macro expansion
      From worker 2:        @ ./timing.jl:395 [inlined]
      From worker 2:     [11] pool_free(managed::CUDA.Managed{CUDA.DeviceMemory})
      From worker 2:        @ CUDA ~/lib/CUDA.jl/src/memory.jl:689
      From worker 2:     [12] release(::GPUArrays.RefCounted{CUDA.Managed{CUDA.DeviceMemory}})
      From worker 2:        @ GPUArrays ~/.julia/packages/GPUArrays/qt4ax/src/host/abstractarray.jl:42
      From worker 2:     [13] unsafe_free!
      From worker 2:        @ ~/.julia/packages/GPUArrays/qt4ax/src/host/abstractarray.jl:91 [inlined]
      From worker 2:     [14] unsafe_free!(xs::CuArray{Float32, 2, CUDA.DeviceMemory})
      From worker 2:        @ CUDA ~/lib/CUDA.jl/src/array.jl:94
      From worker 2:     [15] exit
      From worker 2:        @ ./initdefs.jl:28 [inlined]
      From worker 2:     [16] exit()
      From worker 2:        @ Base ./initdefs.jl:29
      From worker 2:     [17] #invokelatest#2
      From worker 2:        @ ./essentials.jl:892 [inlined]
      From worker 2:     [18] invokelatest(::Any)
      From worker 2:        @ Base ./essentials.jl:889
      From worker 2:     [19] (::Distributed.var"#118#120"{Distributed.RemoteDoMsg})()
      From worker 2:        @ Distributed ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:310
      From worker 2:     [20] run_work_thunk(thunk::Distributed.var"#118#120"{Distributed.RemoteDoMsg}, print_error::Bool)
      From worker 2:        @ Distributed ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:70
      From worker 2:     [21] (::Distributed.var"#117#119"{Distributed.RemoteDoMsg})()
      From worker 2:        @ Distributed ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:310
      From worker 2:    WARNING: Error while freeing DeviceMemory(1.562 KiB at 0x0000000302122a00):
      From worker 2:    CUDA.CuError(code=CUDA.cudaError_enum(0x000002bc))

when launching julia --project test/runtests.jl libraries/cublas

julia> CUDA.versioninfo()
CUDA runtime 12.6, artifact installation
CUDA driver 12.4
NVIDIA driver 550.90.7

CUDA libraries: 
- CUBLAS: 12.6.0
- CURAND: 10.3.7
- CUFFT: 11.2.6
- CUSOLVER: 11.6.4
- CUSPARSE: 12.5.2
- CUPTI: 2024.3.0 (API 24.0.0)
- NVML: 12.0.0+550.90.7

Julia packages: 
- CUDA: 5.5.0
- CUDA_Driver_jll: 0.10.0+0
- CUDA_Runtime_jll: 0.15.1+0

Toolchain:
- Julia: 1.10.2
- LLVM: 15.0.7

1 device:
  0: NVIDIA GeForce RTX 3080 (sm_86, 7.857 GiB / 10.000 GiB available)

maleadt · 2024-09-12T09:46:29Z

julia>  CUDA.CuError(CUDA.cudaError_enum(0x000002bc))
CuError(CUDA_ERROR_ILLEGAL_ADDRESS)

The changes in this PR seem to triggering some illegal memory access.

maleadt · 2024-09-17T18:11:50Z

I'm seeing similar issues locally, but I'm having a hard time isolating the problem. Many times, the libraries/cublas test suite hangs on this PR, while only taking the gemv tests modified here doesn't reproduce the issue.

Update wrappers.jl

8b54f85

Fix incorrect definition of m and n in gemv_strided_batched!

maleadt added needs tests Tests are requested. bugfix This gets something working again. labels Aug 28, 2024

kose-y added 4 commits August 28, 2024 10:09

Update cublas gemv tests

b25648f

fix the corresponding bug in gemv_batched!

39c75df

more fix for gemv_batched!

f5b810f

all the input dimensions should be identical for gemv_batched!

fix tests

8c1e036

kose-y changed the title ~~Bugfix for gemv_strided_batched!~~ Bugfix for batched gemv Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix for batched gemv #2481

Bugfix for batched gemv #2481

kose-y commented Aug 28, 2024

maleadt commented Aug 28, 2024

kose-y commented Aug 28, 2024

kose-y commented Aug 28, 2024

kose-y commented Aug 28, 2024 •

edited

Loading

kose-y commented Aug 29, 2024

maleadt commented Aug 29, 2024

kose-y commented Sep 9, 2024

lpawela commented Sep 9, 2024

lpawela commented Sep 11, 2024

maleadt commented Sep 12, 2024 •

edited

Loading

maleadt commented Sep 17, 2024

Bugfix for batched gemv #2481

Are you sure you want to change the base?

Bugfix for batched gemv #2481

Conversation

kose-y commented Aug 28, 2024

maleadt commented Aug 28, 2024

kose-y commented Aug 28, 2024

kose-y commented Aug 28, 2024

kose-y commented Aug 28, 2024 • edited Loading

kose-y commented Aug 29, 2024

maleadt commented Aug 29, 2024

kose-y commented Sep 9, 2024

lpawela commented Sep 9, 2024

lpawela commented Sep 11, 2024

maleadt commented Sep 12, 2024 • edited Loading

maleadt commented Sep 17, 2024

kose-y commented Aug 28, 2024 •

edited

Loading

maleadt commented Sep 12, 2024 •

edited

Loading