Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, #3520

lcg0808 · 2024-06-17T03:43:59Z

Summary

OS:

Faiss version: faiss-gpu 1.6.3; cuda 11.3 python 3.7

Running on:
GPU 8 A100

Interface:
Python

I tried to run the following code. When I use small-scale data, such as 1M samples, it can be executed normally. However, if I use large-scale data, such as 10M, an error will be reported as follows:Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<IndexType, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with AT = float; BT = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at ./faiss/gpu/utils/MatrixMult-inl.cuh:133; details: cublas failed (13): (512, 128) x (262144, 128)' = (512, 262144)

code:
def train_kmeans(x, k, ngpu=8):
#x:embeddings,like 1000000*128; k cluster_num, like 10000
d = x.shape[1]
clus = faiss.Clustering(d, k)
clus.verbose = True
clus.niter = 20

clus.max_points_per_centroid = 10000000
res = [faiss.StandardGpuResources() for i in range(ngpu)]
flat_config = []
for i in range(ngpu):
    cfg = faiss.GpuIndexFlatConfig()
    cfg.useFloat16 = False # False
    cfg.device = i
    flat_config.append(cfg)

if ngpu == 1:
    index = faiss.GpuIndexFlatIP(res[0], d, flat_config[0])
else:
    indexes = [faiss.GpuIndexFlatIP(res[i], d, flat_config[i])
               for i in range(ngpu)]
    index = faiss.IndexReplicas()
    for sub_index in indexes:
        index.addIndex(sub_index)

# perform the training
clus.train(x, index)
centroids = faiss.vector_float_to_array(clus.centroids)

# obj = faiss.vector_float_to_array(clus.obj)
# print("final objective: %.4g" % obj[-1])

return centroids.reshape(k, d)

The text was updated successfully, but these errors were encountered:

mdouze · 2024-06-17T16:21:34Z

Could you try this with a recent of Faiss (with cuda 12)?

PrithivirajDamodaran · 2024-08-19T07:50:47Z

With faiss-gpu 1.7.2 and cuda 12 this is still a persistent bug. is there any progress on this. please advice.

huangxixiyiqi · 2024-09-11T09:23:50Z

faiss-gpu 1.7.2 and cuda 12 , I encountered this bug。

Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<IndexType, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with AT = float; BT = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at /project/faiss/faiss/gpu/utils/MatrixMult-inl.cuh:265; details: cublas failed (13): (512, 512) x (512, 512)' = (512, 512) gemm params m 512 n 512 k 512 trA T trB N lda 512 ldb 512 ldc 512

mdouze added the GPU label Jun 17, 2024

asadoughi added the unconfirmed-bug label Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, #3520

Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, #3520

lcg0808 commented Jun 17, 2024 •

edited

Loading

mdouze commented Jun 17, 2024

PrithivirajDamodaran commented Aug 19, 2024

huangxixiyiqi commented Sep 11, 2024

Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, #3520

Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, #3520

Comments

lcg0808 commented Jun 17, 2024 • edited Loading

Summary

mdouze commented Jun 17, 2024

PrithivirajDamodaran commented Aug 19, 2024

huangxixiyiqi commented Sep 11, 2024

lcg0808 commented Jun 17, 2024 •

edited

Loading