You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After initializing the K-means centroids for each value of K, I try to replace the default (cpu) index by a gpu one. This works for a single gpu device but fails when using multiple devices.
classKMeansModule:
def__init__(self, nb_classes, dimensionality=256, n_iter=50, tol=1e-4, k_range=[2,3,4,5], resources=None, config=None):
self.k_range=k_rangeself.d=dimensionalityself.max_iter=n_iterself.tol=tol# Create the K-means objectiflen(k_range) ==1:
self.n_kmeans= [faiss.Kmeans(d=dimensionality, k=k_range[0], niter=1, verbose=True, min_points_per_centroid=1 ) for_inrange(nb_classes)]
else:
# For each class, create n K-Means objects (one for each value of K), where n = len(k_range)# (this will be used to select the best K). self.n_kmeans= []
for_inrange(nb_classes):
self.n_kmeans.append([faiss.Kmeans(d=dimensionality, k=k, niter=1, verbose=False, min_points_per_centroid=1) forkink_range])
definitialize_centroids(self, batch_x, class_id, resources, rank, device, config, cached_features):
image_list=cached_features[class_id] # Use the features cached from the previous epoch batch_x=torch.stack(image_list)
# For each K (model selection)forkinrange(len(self.k_range)):
self.n_kmeans[class_id][k].train(batch_x.detach().cpu()) # Train K-means model for one iteration to initialize centroids # Replace the regular index by a gpu oneindex_flat=self.n_kmeans[class_id][k].indexgpu_index_flat=faiss.index_cpu_to_gpu(resources, rank, index_flat)
self.n_kmeans[class_id][k].index=gpu_index_flatres=faiss.StandardGpuResources()
initialize_centroids(batch_x=None, class_id, resources=res, rank=rank, device=device, cached_features)
Each rank (0, 1, 2, ..., 8) specifies the corresponding gpu device id.
Output
RuntimeError: Error in faiss::gpu::GpuIndex::GpuIndex(std::shared_ptr<faiss::gpu::GpuResources>, int, faiss::MetricType, float, faiss::gpu::GpuIndexConfig) at /home/circleci/miniconda/conda-bld/faiss-pkg_1709244513520/work/faiss/gpu/GpuIndex.cu:65: Error: 'config_.device < getNumDevices()' failed: Invalid GPU device 7
Process Process-4:
Traceback (most recent call last):
File "/home/rtcalumby/miniconda3/envs/py382/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/rtcalumby/miniconda3/envs/py382/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/rtcalumby/adam/luciano/LifeCLEFPlant2022/DeepCluster/main_deeper_cluster.py", line 52, in process_main
app_main(args=params)
File "/home/rtcalumby/adam/luciano/LifeCLEFPlant2022/DeepCluster/engine_deeper_cluster.py", line 401, in main
k_means_module.init(resources=res, rank=rank, cached_features=cached_features_last_epoch, config=cfg, device=device) # E-step
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rtcalumby/adam/luciano/LifeCLEFPlant2022/DeepCluster/src/KMeans.py", line 98, in init
self.initialize_centroids(batch_x=None,
File "/home/rtcalumby/adam/luciano/LifeCLEFPlant2022/DeepCluster/src/KMeans.py", line 92, in initialize_centroids
gpu_index_flat = faiss.index_cpu_to_gpu(resources, rank, index_flat)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rtcalumby/miniconda3/envs/py382/lib/python3.12/site-packages/faiss/swigfaiss_avx2.py", line 12799, in index_cpu_to_gpu
return _swigfaiss_avx2.index_cpu_to_gpu(provider, device, index, options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Error in faiss::gpu::GpuIndex::GpuIndex(std::shared_ptr<faiss::gpu::GpuResources>, int, faiss::MetricType, float, faiss::gpu::GpuIndexConfig) at /home/circleci/miniconda/conda-bld/faiss-pkg_1709244513520/work/faiss/gpu/GpuIndex.cu:65: Error: 'config_.device < getNumDevices()' failed: Invalid GPU device 3
Process Process-7:
Traceback (most recent call last):
File "/home/rtcalumby/miniconda3/envs/py382/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/rtcalumby/miniconda3/envs/py382/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/rtcalumby/adam/luciano/LifeCLEFPlant2022/DeepCluster/main_deeper_cluster.py", line 52, in process_main
app_main(args=params)
File "/home/rtcalumby/adam/luciano/LifeCLEFPlant2022/DeepCluster/engine_deeper_cluster.py", line 401, in main
k_means_module.init(resources=res, rank=rank, cached_features=cached_features_last_epoch, config=cfg, device=device) # E-step
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rtcalumby/adam/luciano/LifeCLEFPlant2022/DeepCluster/src/KMeans.py", line 98, in init
self.initialize_centroids(batch_x=None,
File "/home/rtcalumby/adam/luciano/LifeCLEFPlant2022/DeepCluster/src/KMeans.py", line 92, in initialize_centroids
gpu_index_flat = faiss.index_cpu_to_gpu(resources, rank, index_flat)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rtcalumby/miniconda3/envs/py382/lib/python3.12/site-packages/faiss/swigfaiss_avx2.py", line 12799, in index_cpu_to_gpu
return _swigfaiss_avx2.index_cpu_to_gpu(provider, device, index, options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Error in faiss::gpu::GpuIndex::GpuIndex(std::shared_ptr<faiss::gpu::GpuResources>, int, faiss::MetricType, float, faiss::gpu::GpuIndexConfig) at /home/circleci/miniconda/conda-bld/faiss-pkg_1709244513520/work/faiss/gpu/GpuIndex.cu:65: Error: 'config_.device < getNumDevices()' failed: Invalid GPU device 6
res=faiss.StandardGpuResources()
cfg=faiss.GpuIndexFlatConfig()
cfg.device=rank# Replace the regular index by a gpu oneindex_flat=self.n_kmeans[class_id][k].indexgpu_index_flat=faiss.GpuIndexFlatL2(resources, self.d, config)
self.n_kmeans[class_id][k].index=gpu_index_flat
Output
File "/home/rtcalumby/miniconda3/envs/py382/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/rtcalumby/miniconda3/envs/py382/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/rtcalumby/adam/luciano/LifeCLEFPlant2022/DeepCluster/main_deeper_cluster.py", line 52, in process_main
app_main(args=params)
File "/home/rtcalumby/adam/luciano/LifeCLEFPlant2022/DeepCluster/engine_deeper_cluster.py", line 401, in main
logger.info('Initializing centroids...')
File "/home/rtcalumby/adam/luciano/LifeCLEFPlant2022/DeepCluster/src/KMeans.py", line 98, in init
self.initialize_centroids(batch_x=None,
File "/home/rtcalumby/adam/luciano/LifeCLEFPlant2022/DeepCluster/src/KMeans.py", line 91, in initialize_centroids
gpu_index_flat = faiss.GpuIndexFlatL2(resources, self.d, config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rtcalumby/miniconda3/envs/py382/lib/python3.12/site-packages/faiss/swigfaiss_avx2.py", line 11575, in __init__
_swigfaiss_avx2.GpuIndexFlatL2_swiginit(self, _swigfaiss_avx2.new_GpuIndexFlatL2(*args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Error in faiss::gpu::GpuIndex::GpuIndex(std::shared_ptr<faiss::gpu::GpuResources>, int, faiss::MetricType, float, faiss::gpu::GpuIndexConfig) at /home/circleci/miniconda/conda-bld/faiss-pkg_1709244513520/work/faiss/gpu/GpuIndex.cu:65: Error: 'config_.device < getNumDevices()' failed: Invalid GPU device 7
Process Process-6:
File "/home/rtcalumby/adam/luciano/LifeCLEFPlant2022/DeepCluster/engine_deeper_cluster.py", line 401, in main
k_means_module.init(resources=resources, rank=rank, cached_features=cached_features_last_epoch, config=None, device=device) # E-step
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rtcalumby/adam/luciano/LifeCLEFPlant2022/DeepCluster/src/KMeans.py", line 100, in init
self.initialize_centroids(batch_x=None,
File "/home/rtcalumby/adam/luciano/LifeCLEFPlant2022/DeepCluster/src/KMeans.py", line 93, in initialize_centroids
gpu_index_flat = faiss.index_cpu_to_gpu_multiple(resources, devices=[0,1,2,3,4,5,6,7],index=index_flat)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rtcalumby/miniconda3/envs/py382/lib/python3.12/site-packages/faiss/swigfaiss_avx2.py", line 12802, in index_cpu_to_gpu_multiple
return _swigfaiss_avx2.index_cpu_to_gpu_multiple(provider, devices, index, options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Wrong number or type of arguments for overloaded function 'index_cpu_to_gpu_multiple'.
Possible C/C++ prototypes are:
faiss::gpu::index_cpu_to_gpu_multiple(std::vector< faiss::gpu::GpuResourcesProvider * > &,std::vector< int > &,faiss::Index const *,faiss::gpu::GpuMultipleClonerOptions const *)
faiss::gpu::index_cpu_to_gpu_multiple(std::vector< faiss::gpu::GpuResourcesProvider * > &,std::vector< int > &,faiss::Index const *)
The text was updated successfully, but these errors were encountered:
I'm trying to replace the cpu index by a gpu one but can't seem to do it on a distributed context.
Faiss version:
faiss 1.8.0 pypi_0 pypi
faiss-gpu 1.8.0 py3.12_h4c7d538_0_cuda12.1.1 pytorch
Installed from miniconda.
Running on:
Interface:
Context
After initializing the K-means centroids for each value of K, I try to replace the default (cpu) index by a gpu one. This works for a single gpu device but fails when using multiple devices.
Each rank (0, 1, 2, ..., 8) specifies the corresponding gpu device id.
Output
Attempts
I have tried this as well (https://github.com/facebookresearch/DeeperCluster/blob/main/src/distributed_kmeans.py#L182), wondering that each process would initialize its own resources specifying the device number accordingly, but the same error happens.
Output
Other than that i have also tried the solution proposed here (https://github.com/facebookresearch/faiss/wiki/Faiss-on-the-GPU)
Which generates:
The text was updated successfully, but these errors were encountered: