From cbcd4a0106428157d2deac30a15899c95496aede Mon Sep 17 00:00:00 2001
From: "Corey J. Nolet" <cjnolet@gmail.com>
Date: Sat, 14 Oct 2023 02:44:44 +0200
Subject: [PATCH] Refactor install/build guide. (#1899)

This PR makes some mcuh needed changes to the installation and builde guide. This PR also fixes a few APIs that were missing from the API docs.

Closes #1895

Authors:
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: https://github.com/rapidsai/raft/pull/1899
---
 README.md                                     | 112 ++-----
 docs/source/build.md                          | 292 ++++++++----------
 docs/source/cpp_api/cluster.rst               |   1 +
 .../cpp_api/cluster_kmeans_balanced.rst       |  13 +
 docs/source/pylibraft_api.rst                 |   1 +
 docs/source/pylibraft_api/matrix.rst          |  11 +
 docs/source/raft_ann_benchmarks.md            |  36 +--
 7 files changed, 204 insertions(+), 262 deletions(-)
 create mode 100644 docs/source/cpp_api/cluster_kmeans_balanced.rst
 create mode 100644 docs/source/pylibraft_api/matrix.rst

diff --git a/README.md b/README.md
index 56d422b489..5b1297b63c 100755
--- a/README.md
+++ b/README.md
@@ -255,106 +255,54 @@ pairwise_distance(in1, in2, out=output, metric="euclidean")
 
 ## Installing
 
-RAFT itself can be installed through conda, [CMake Package Manager (CPM)](https://github.com/cpm-cmake/CPM.cmake), pip, or by building the repository from source. Please refer to the [build instructions](docs/source/build.md) for more a comprehensive guide on installing and building RAFT and using it in downstream projects.
+RAFT's C++ and Python libraries can both be installed through Conda and the Python libraries through Pip. 
 
-### Conda
+
+### Installing C++ and Python through Conda
 
 The easiest way to install RAFT is through conda and several packages are provided.
-- `libraft-headers` RAFT headers
-- `libraft` (optional) shared library of pre-compiled template instantiations and runtime APIs.
-- `pylibraft` (optional) Python wrappers around RAFT algorithms and primitives.
-- `raft-dask` (optional) enables deployment of multi-node multi-GPU algorithms that use RAFT `raft::comms` in Dask clusters.
+- `libraft-headers` C++ headers
+- `libraft` (optional) C++ shared library containing pre-compiled template instantiations and runtime API.
+- `pylibraft` (optional) Python library
+- `raft-dask` (optional) Python library for deployment of multi-node multi-GPU algorithms that use the RAFT `raft::comms` abstraction layer in Dask clusters.
+- `raft-ann-bench` (optional) Benchmarking tool for easily producing benchmarks that compare RAFT's vector search algorithms against other state-of-the-art implementations.
+- `raft-ann-bench-cpu` (optional) Reproducible benchmarking tool similar to above, but doesn't require CUDA to be installed on the machine. Can be used to test in environments with competitive CPUs.
+
+Use the following command, depending on your CUDA version, to install all of the RAFT packages with conda (replace `rapidsai` with `rapidsai-nightly` to install more up-to-date but less stable nightly packages). `mamba` is preferred over the `conda` command.
+```bash
+# for CUDA 11.8
+mamba install -c rapidsai -c conda-forge -c nvidia raft-dask pylibraft cuda-version=11.8
+```
 
-Use the following command to install all of the RAFT packages with conda (replace `rapidsai` with `rapidsai-nightly` to install more up-to-date but less stable nightly packages). `mamba` is preferred over the `conda` command.
 ```bash
-mamba install -c rapidsai -c conda-forge -c nvidia raft-dask pylibraft
+# for CUDA 12.0
+mamba install -c rapidsai -c conda-forge -c nvidia raft-dask pylibraft cuda-version=12.0
 ```
 
-You can also install the conda packages individually using the `mamba` command above.
+Note that the above commands will also install `libraft-headers` and `libraft`.
+
+You can also install the conda packages individually using the `mamba` command above. For example, if you'd like to install RAFT's headers and pre-compiled shared library to use in your project:
+```bash
+# for CUDA 12.0
+mamba install -c rapidsai -c conda-forge -c nvidia libraft libraft-headers cuda-version=12.0
+```
 
-After installing RAFT, `find_package(raft COMPONENTS compiled distributed)` can be used in your CUDA/C++ cmake build to compile and/or link against needed dependencies in your raft target. `COMPONENTS` are optional and will depend on the packages installed.
+If installing the C++ APIs please see [using libraft](https://docs.rapids.ai/api/raft/nightly/using_libraft/) for more information on using the pre-compiled shared library. You can also refer to the [example C++ template project](https://github.com/rapidsai/raft/tree/branch-23.12/cpp/template) for a ready-to-go CMake configuration that you can drop into your project and build against installed RAFT development artifacts above.
 
-### Pip
+### Installing Python through Pip
 
-pylibraft and raft-dask both have experimental packages that can be [installed through pip](https://rapids.ai/pip.html#install):
+`pylibraft` and `raft-dask` both have experimental packages that can be [installed through pip](https://rapids.ai/pip.html#install):
 ```bash
 pip install pylibraft-cu11 --extra-index-url=https://pypi.nvidia.com
 pip install raft-dask-cu11 --extra-index-url=https://pypi.nvidia.com
 ```
 
-### CMake & CPM
-
-RAFT uses the [RAPIDS-CMake](https://github.com/rapidsai/rapids-cmake) library, which makes it easy to include in downstream cmake projects. RAPIDS-CMake provides a convenience layer around CPM. Please refer to [these instructions](https://github.com/rapidsai/rapids-cmake#installation) to install and use rapids-cmake in your project.
-
-#### Example Template Project
+These packages statically build RAFT's pre-compiled instantiations and so the C++ headers and pre-compiled shared library won't be readily available to use in your code.
 
-You can find an [example RAFT](cpp/template/README.md) project template in the `cpp/template` directory, which demonstrates how to build a new application with RAFT or incorporate RAFT into an existing cmake project.
+The [build instructions](https://docs.rapids.ai/api/raft/nightly/build/) contain more details on building RAFT from source and including it in downstream projects. You can also find a more comprehensive version of the above CPM code snippet the [Building RAFT C++ and Python from source](https://docs.rapids.ai/api/raft/nightly/build/#building-c-and-python-from-source) section of the build instructions.
 
-#### CMake Targets
-
-Additional CMake targets can be made available by adding components in the table below to the `RAFT_COMPONENTS` list above, separated by spaces. The `raft::raft` target will always be available. RAFT headers require, at a minimum, the CUDA toolkit libraries and RMM dependencies.
-
-| Component   | Target              | Description                                              | Base Dependencies                      |
-|-------------|---------------------|----------------------------------------------------------|----------------------------------------|
-| n/a         | `raft::raft`        | Full RAFT header library                                 | CUDA toolkit, RMM, NVTX, CCCL, CUTLASS |
-| compiled    | `raft::compiled`    | Pre-compiled template instantiations and runtime library | raft::raft                             |
-| distributed | `raft::distributed` | Dependencies for `raft::comms` APIs                      | raft::raft, UCX, NCCL                  |
-
-### Source
-
-The easiest way to build RAFT from source is to use the `build.sh` script at the root of the repository:
-1. Create an environment with the needed dependencies:
-```
-mamba env create --name raft_dev_env -f conda/environments/all_cuda-118_arch-x86_64.yaml
-mamba activate raft_dev_env
-```
-```
-./build.sh raft-dask pylibraft libraft tests bench --compile-lib
-```
+You can find an example [RAFT project template](cpp/template/README.md) in the `cpp/template` directory, which demonstrates how to build a new application with RAFT or incorporate RAFT into an existing CMake project.
 
-The [build](docs/source/build.md) instructions contain more details on building RAFT from source and including it in downstream projects. You can also find a more comprehensive version of the above CPM code snippet the [Building RAFT C++ from source](docs/source/build.md#building-raft-c-from-source-in-cmake) section of the build instructions.
-
-## Folder Structure and Contents
-
-The folder structure mirrors other RAPIDS repos, with the following folders:
-
-- `bench/ann`: Python scripts for running ANN benchmarks
-- `ci`: Scripts for running CI in PRs
-- `conda`: Conda recipes and development conda environments
-- `cpp`: Source code for C++ libraries.
-  - `bench`: Benchmarks source code
-  - `cmake`: CMake modules and templates
-  - `doxygen`: Doxygen configuration
-  - `include`: The C++ API headers are fully-contained here (deprecated directories are excluded from the listing below)
-    - `cluster`: Basic clustering primitives and algorithms.
-    - `comms`: A multi-node multi-GPU communications abstraction layer for NCCL+UCX and MPI+NCCL, which can be deployed in Dask clusters using the `raft-dask` Python package.
-    - `core`: Core API headers which require minimal dependencies aside from RMM and Cudatoolkit. These are safe to expose on public APIs and do not require `nvcc` to build. This is the same for any headers in RAFT which have the suffix `*_types.hpp`.
-    - `distance`: Distance primitives
-    - `linalg`: Dense linear algebra
-    - `matrix`: Dense matrix operations
-    - `neighbors`: Nearest neighbors and knn graph construction
-    - `random`: Random number generation, sampling, and data generation primitives
-    - `solver`: Iterative and combinatorial solvers for optimization and approximation
-    - `sparse`: Sparse matrix operations
-      - `convert`: Sparse conversion functions
-      - `distance`: Sparse distance computations
-      - `linalg`: Sparse linear algebra
-      - `neighbors`: Sparse nearest neighbors and knn graph construction
-      - `op`: Various sparse operations such as slicing and filtering (Note: this will soon be renamed to `sparse/matrix`)
-      - `solver`: Sparse solvers for optimization and approximation
-    - `stats`: Moments, summary statistics, model performance measures
-    - `util`: Various reusable tools and utilities for accelerated algorithm development
-  - `internal`: A private header-only component that hosts the code shared between benchmarks and tests.
-  - `scripts`: Helpful scripts for development
-  - `src`: Compiled APIs and template instantiations for the shared libraries
-  - `template`: A skeleton template containing the bare-bones file structure and cmake configuration for writing applications with RAFT.
-  - `test`: Googletests source code
-- `docs`: Source code and scripts for building library documentation (Uses breath, doxygen, & pydocs)
-- `notebooks`: IPython notebooks with usage examples and tutorials
-- `python`: Source code for Python libraries.
-  - `pylibraft`: Python build and source code for pylibraft library
-  - `raft-dask`: Python build and source code for raft-dask library
-- `thirdparty`: Third-party licenses
 
 ## Contributing
 
diff --git a/docs/source/build.md b/docs/source/build.md
index 4a8748deb6..4be0a84090 100644
--- a/docs/source/build.md
+++ b/docs/source/build.md
@@ -1,12 +1,41 @@
 # Installation
 
-### Conda
+RAFT currently provides libraries for C++ and Python. The C++ libraries, including the header-only and optional shared library, can be installed with Conda. 
+
+Both the C++ and Python APIs require CMake to build from source.
+
+## Table of Contents
+
+- [Install C++ and Python through Conda](#installing-c-and-python-through-conda)
+- [Installing Python through Pip](#installing-python-through-pip)
+- [Building C++ and Python from source](#building-c-and-python-from-source)
+  - [CUDA/GPU requirements](#cudagpu-requirements)
+  - [Build dependencies](#build-dependencies)
+    - [Required](#required)
+    - [Optional](#optional)
+    - [Conda environment scripts](#conda-environment-scripts)
+  - [Header-only C++](#header-only-c)
+  - [C++ shared library](#c-shared-library-optional)
+  - [ccache and sccache](#ccache-and-sccache)
+  - [C++ tests](#c-tests)
+  - [C++ primitives microbenchmarks](#c-primitives-microbenchmarks)
+  - [Python libraries](#python-libraries)
+- [Using CMake directly](#using-cmake-directly)
+- [Build documentation](#build-documentation)
+- [Using RAFT in downstream projects](#using-raft-c-in-downstream-projects)
+  - [CMake targets](#cmake-targets)
+
+------
+
+## Installing C++ and Python through Conda
 
 The easiest way to install RAFT is through conda and several packages are provided.
-- `libraft-headers` RAFT headers
-- `libraft` (optional) shared library containing pre-compiled template instantiations and runtime API.
-- `pylibraft` (optional) Python wrappers around RAFT algorithms and primitives.
-- `raft-dask` (optional) enables deployment of multi-node multi-GPU algorithms that use RAFT `raft::comms` in Dask clusters.
+- `libraft-headers` C++ headers
+- `libraft` (optional) C++ shared library containing pre-compiled template instantiations and runtime API.
+- `pylibraft` (optional) Python library
+- `raft-dask` (optional) Python library for deployment of multi-node multi-GPU algorithms that use the RAFT `raft::comms` abstraction layer in Dask clusters.
+- `raft-ann-bench` (optional) Benchmarking tool for easily producing benchmarks that compare RAFT's vector search algorithms against other state-of-the-art implementations.
+- `raft-ann-bench-cpu` (optional) Reproducible benchmarking tool similar to above, but doesn't require CUDA to be installed on the machine. Can be used to test in environments with competitive CPUs.
 
 Use the following command, depending on your CUDA version, to install all of the RAFT packages with conda (replace `rapidsai` with `rapidsai-nightly` to install more up-to-date but less stable nightly packages). `mamba` is preferred over the `conda` command.
 ```bash
@@ -19,19 +48,35 @@ mamba install -c rapidsai -c conda-forge -c nvidia raft-dask pylibraft cuda-vers
 mamba install -c rapidsai -c conda-forge -c nvidia raft-dask pylibraft cuda-version=12.0
 ```
 
-You can also install the conda packages individually using the `mamba` command above.
+Note that the above commands will also install `libraft-headers` and `libraft`.
 
-After installing RAFT, `find_package(raft COMPONENTS nn distance)` can be used in your CUDA/C++ cmake build to compile and/or link against needed dependencies in your raft target. `COMPONENTS` are optional and will depend on the packages installed.
+You can also install the conda packages individually using the `mamba` command above. For example, if you'd like to install RAFT's headers and pre-compiled shared library to use in your project:
+```bash
+# for CUDA 12.0
+mamba install -c rapidsai -c conda-forge -c nvidia libraft libraft-headers cuda-version=12.0
+```
 
-### Pip
+If installing the C++ APIs Please see [using libraft](https://docs.rapids.ai/api/raft/nightly/using_libraft/) for more information on using the pre-compiled shared library. You can also refer to the [example C++ template project](https://github.com/rapidsai/raft/tree/branch-23.12/cpp/template) for a ready-to-go CMake configuration that you can drop into your project and build against installed RAFT development artifacts above.
 
-pylibraft and raft-dask both have experimental packages that can be [installed through pip](https://rapids.ai/pip.html#install):
+## Installing Python through Pip
+
+`pylibraft` and `raft-dask` both have packages that can be [installed through pip](https://rapids.ai/pip.html#install). 
+
+For CUDA 11 packages:
 ```bash
 pip install pylibraft-cu11 --extra-index-url=https://pypi.nvidia.com
 pip install raft-dask-cu11 --extra-index-url=https://pypi.nvidia.com
 ```
 
-## Building and installing RAFT
+And CUDA 12 packages:
+```bash
+pip install pylibraft-cu12 --extra-index-url=https://pypi.nvidia.com
+pip install raft-dask-cu12 --extra-index-url=https://pypi.nvidia.com
+```
+
+These packages statically build RAFT's pre-compiled instantiations, so the C++ headers and pre-compiled shared library won't be readily available to use in your code. 
+
+## Building C++ and Python from source
 
 ### CUDA/GPU Requirements
 - cmake 3.26.4+
@@ -57,9 +102,23 @@ In addition to the libraries included with cudatoolkit 11.0+, there are some oth
 - [Googlebench](https://github.com/google/benchmark) - Needed to build benchmarks
 - [Doxygen](https://github.com/doxygen/doxygen) - Needed to build docs
 
-All of RAFT's C++ APIs can be used header-only but pre-compiled shared libraries also contain some host-accessible APIs and template instantiations to accelerate compile times.
+#### Conda environment scripts
+
+Conda environment scripts are provided for installing the necessary dependencies to build both the C++ and Python libraries from source. It is preferred to use `mamba`, as it provides significant speedup over `conda`:
+```bash
+mamba env create --name rapids_raft -f conda/environments/all_cuda-120_arch-x86_64.yaml
+mamba activate rapids_raft
+```
+
+All of RAFT's C++ APIs can be used header-only and optional pre-compiled shared libraries provide some host-accessible runtime APIs and template instantiations to accelerate compile times.
+
+The process for building from source with CUDA 11 differs slightly in that your host system will also need to have CUDA toolkit installed which is greater than, or equal to, the version you install into you conda environment. Installing CUDA toolkit into your host system is necessary because `nvcc` is not provided with Conda's cudatoolkit dependencies for CUDA 11. The following example will install create and install dependencies for a CUDA 11.8 conda environment
+```bash
+mamba env create --name rapids_raft -f conda/environments/all_cuda-118_arch-x86_64.yaml
+mamba activate rapids_raft
+```
 
-The recommended way to build and install RAFT is to use the `build.sh` script in the root of the repository. This script can build both the C++ and Python artifacts and provides options for building and installing the headers, tests, benchmarks, and individual shared libraries.
+The recommended way to build and install RAFT from source is to use the `build.sh` script in the root of the repository. This script can build both the C++ and Python artifacts and provides CMake options for building and installing the headers, tests, benchmarks, and the pre-compiled shared library.
 
 ### Header-only C++
 
@@ -68,9 +127,8 @@ The recommended way to build and install RAFT is to use the `build.sh` script in
 The following example will download the needed dependencies and install the RAFT headers into `$INSTALL_PREFIX/include/raft`. 
 ```bash
 ./build.sh libraft
-
 ```
-The `-n` flag can be passed to just have the build download the needed dependencies. Since RAFT is primarily used at build-time, the dependencies will never be installed by the RAFT build.
+The `-n` flag can be passed to just have the build download the needed dependencies. Since RAFT's C++ headers are primarily used during build-time in downstream projects, the dependencies will never be installed by the RAFT build.
 ```bash
 ./build.sh libraft -n
 ```
@@ -80,7 +138,6 @@ Once installed, `libraft` headers (and dependencies which were downloaded and in
 ./build.sh libraft --uninstall
 ```
 
-
 ### C++ Shared Library (optional)
 
 A shared library can be built for speeding up compile times. The shared library also contains a runtime API that allows you to invoke RAFT APIs directly from C++ source files (without `nvcc`). The shared library can also significantly improve re-compile times both while developing RAFT and using its APIs to develop applications. Pass the `--compile-lib` flag to `build.sh` to build the library:
@@ -104,7 +161,7 @@ Once installed, the shared library, headers (and any dependencies downloaded and
 ./build.sh libraft --cache-tool=ccache
 ```
 
-### Tests
+### C++ Tests
 
 Compile the tests using the `tests` target in `build.sh`.
 
@@ -131,72 +188,35 @@ It can take sometime to compile all of the tests. You can build individual tests
 ./build.sh libraft tests -n --limit-tests=NEIGHBORS_TEST;DISTANCE_TEST;MATRIX_TEST
 ```
 
-### Benchmarks
+### C++ Primitives Microbenchmarks
 
-The benchmarks are broken apart by algorithm category, so you will find several binaries in `cpp/build/` named `*_BENCH`.
+The benchmarks are broken apart by algorithm category, so you will find several binaries in `cpp/build/` named `*_PRIMS_BENCH`.
 ```bash
-./build.sh libraft bench
+./build.sh libraft bench-prims
 ```
 
-It can take sometime to compile all of the benchmarks. You can build individual benchmarks by providing a semicolon-separated list to the `--limit-bench` option in `build.sh`:
+It can take sometime to compile all of the benchmarks. You can build individual benchmarks by providing a semicolon-separated list to the `--limit-bench-prims` option in `build.sh`:
 
 ```bash
-./build.sh libraft bench -n --limit-bench=NEIGHBORS_BENCH;DISTANCE_BENCH;LINALG_BENCH
-```
-
-### C++ Using Cmake Directly
-
-Use `CMAKE_INSTALL_PREFIX` to install RAFT into a specific location. The snippet below will install it into the current conda environment:
-```bash
-cd cpp
-mkdir build
-cd build
-cmake -D BUILD_TESTS=ON -DRAFT_COMPILE_LIBRARY=ON -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX ../
-make -j<parallel_level> install
+./build.sh libraft bench-prims -n --limit-bench=NEIGHBORS_PRIMS_BENCH;DISTANCE_PRIMS_BENCH;LINALG_PRIMS_BENCH
 ```
 
-RAFT's cmake has the following configurable flags available:.
-
-| Flag                            | Possible Values      | Default Value | Behavior                                                                     |
-|---------------------------------|----------------------| --- |------------------------------------------------------------------------------|
-| BUILD_TESTS                     | ON, OFF              | ON | Compile Googletests                                                          |
-| BUILD_PRIMS_BENCH                     | ON, OFF              | OFF | Compile benchmarks                                                           |
-| BUILD_ANN_BENCH               | ON, OFF              | OFF | Compile end-to-end ANN benchmarks |
-| RAFT_COMPILE_LIBRARY      | ON, OFF              | ON if either BUILD_TESTS or BUILD_PRIMS_BENCH is ON; otherwise OFF | Compiles all `libraft` shared libraries (these are required for Googletests) |
-| raft_FIND_COMPONENTS            | compiled distributed | | Configures the optional components as a space-separated list                 |
-| RAFT_ENABLE_CUBLAS_DEPENDENCY   | ON, OFF | ON | Link against cublas library in `raft::raft`                                  | 
-| RAFT_ENABLE_CUSOLVER_DEPENDENCY | ON, OFF | ON | Link against cusolver library in `raft::raft`                                | 
-| RAFT_ENABLE_CUSPARSE_DEPENDENCY | ON, OFF | ON | Link against cusparse library in `raft::raft`                                | 
-| RAFT_ENABLE_CUSOLVER_DEPENDENCY | ON, OFF | ON | Link against curand library in `raft::raft`                                  | 
-| DETECT_CONDA_ENV                | ON, OFF              | ON | Enable detection of conda environment for dependencies                       |
-| RAFT_NVTX                       | ON, OFF              | OFF | Enable NVTX Markers                                                          |
-| CUDA_ENABLE_KERNELINFO          | ON, OFF              | OFF | Enables `kernelinfo` in nvcc. This is useful for `compute-sanitizer`         |
-| CUDA_ENABLE_LINEINFO            | ON, OFF              | OFF | Enable the -lineinfo option for nvcc                                         |
-| CUDA_STATIC_RUNTIME             | ON, OFF              | OFF | Statically link the CUDA runtime                                             |
-
-Currently, shared libraries are provided for the `libraft-nn` and `libraft-distance` components.
+In addition to microbenchmarks for individual primitives, RAFT contains a reproducible benchmarking tool for evaluating the performance of RAFT's vector search algorithms against the existing state-of-the-art. Please refer to the [RAFT ANN Benchmarks](https://docs.rapids.ai/api/raft/nightly/raft_ann_benchmarks/) guide for more information on this tool.
 
-### Python
+### Python libraries
 
-Conda environment scripts are provided for installing the necessary dependencies for building and using the Python APIs. It is preferred to use `mamba`, as it provides significant speedup over `conda`. In addition you will have to manually install `nvcc` as it will not be installed as part of the conda environment. The following example will install create and install dependencies for a CUDA 11.8 conda environment:
-
-```bash
-mamba env create --name raft_env_name -f conda/environments/all_cuda-118_arch-x86_64.yaml
-mamba activate raft_env_name
-```
-
-The Python APIs can be built and installed using the `build.sh` script:
+The Python libraries can be built and installed using the `build.sh` script:
 
 ```bash
 # to build pylibraft
 ./build.sh libraft pylibraft --compile-lib
-# to build raft-dask
+# to build raft-dask (depends on pylibraft)
 ./build.sh libraft pylibraft raft-dask --compile-lib
 ```
 
-`setup.py` can also be used to build the Python APIs manually:
+`setup.py` can also be used to build the Python libraries manually:
 
-```
+```bash
 cd python/raft-dask
 python setup.py build_ext --inplace
 python setup.py install
@@ -206,7 +226,7 @@ python setup.py build_ext --inplace
 python setup.py install
 ```
 
-To run the Python tests:
+Python tests are automatically installed with the corresponding libraries. To run Python tests:
 ```bash
 cd python/raft-dask
 py.test -s -v
@@ -220,27 +240,56 @@ The Python packages can also be uninstalled using the `build.sh` script:
 ./build.sh pylibraft raft-dask --uninstall
 ```
 
-### Documentation
+### Using CMake directly
 
-The documentation requires that the C++ headers and python packages have been built and installed.
+When building RAFT from source, the `build.sh` script offers a nice wrapper around the `cmake` commands to ease the burdens of manually configuring the various available cmake options. When more fine-grained control over the CMake configuration is desired, the `cmake` command can be invoked directly as the below example demonstrates. 
 
-The following will build the docs along with the C++ and Python packages:
+The `CMAKE_INSTALL_PREFIX` installs RAFT into a specific location. The example below installs RAFT into the current Conda environment:
+```bash
+cd cpp
+mkdir build
+cd build
+cmake -D BUILD_TESTS=ON -DRAFT_COMPILE_LIBRARY=ON -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX ../
+make -j<parallel_level> install
+```
+
+RAFT's CMake has the following configurable flags available:
+
+| Flag                            | Possible Values      | Default Value | Behavior                                                                     |
+|---------------------------------|----------------------| --- |------------------------------------------------------------------------------|
+| BUILD_TESTS                     | ON, OFF              | ON | Compile Googletests                                                          |
+| BUILD_PRIMS_BENCH               | ON, OFF              | OFF | Compile benchmarks                                                           |
+| BUILD_ANN_BENCH                 | ON, OFF              | OFF | Compile end-to-end ANN benchmarks |
+| CUDA_ENABLE_KERNELINFO          | ON, OFF              | OFF | Enables `kernelinfo` in nvcc. This is useful for `compute-sanitizer`         |
+| CUDA_ENABLE_LINEINFO            | ON, OFF              | OFF | Enable the -lineinfo option for nvcc                                         |
+| CUDA_STATIC_RUNTIME             | ON, OFF              | OFF | Statically link the CUDA runtime                                             |
+| DETECT_CONDA_ENV                | ON, OFF              | ON | Enable detection of conda environment for dependencies                       |
+| raft_FIND_COMPONENTS            | compiled distributed | | Configures the optional components as a space-separated list                 |
+| RAFT_COMPILE_LIBRARY            | ON, OFF              | ON if either BUILD_TESTS or BUILD_PRIMS_BENCH is ON; otherwise OFF | Compiles all `libraft` shared libraries (these are required for Googletests) |
+| RAFT_ENABLE_CUBLAS_DEPENDENCY   | ON, OFF | ON | Link against cublas library in `raft::raft`                                  | 
+| RAFT_ENABLE_CUSOLVER_DEPENDENCY | ON, OFF | ON | Link against cusolver library in `raft::raft`                                | 
+| RAFT_ENABLE_CUSPARSE_DEPENDENCY | ON, OFF | ON | Link against cusparse library in `raft::raft`                                | 
+| RAFT_ENABLE_CUSOLVER_DEPENDENCY | ON, OFF | ON | Link against curand library in `raft::raft`                                  | 
+| RAFT_NVTX                       | ON, OFF              | OFF | Enable NVTX Markers                                                          |
+
+### Build documentation
+
+The documentation requires that the C++ and Python libraries have been built and installed. The following will build the docs along with the C++ and Python packages:
 
 ```
 ./build.sh libraft pylibraft raft-dask docs --compile-lib
 ```
 
+## Using RAFT C++ in downstream projects
 
-## Using RAFT in downstream projects
+There are a few different strategies for including RAFT in downstream projects, depending on whether the [required build dependencies](#build-dependencies) have already been installed and are available on the `lib` and `include` search paths.
 
-There are a few different strategies for including RAFT in downstream projects, depending on whether the [required build dependencies](#build-dependencies) have already been installed and are available on the `lib` and `include` paths.
-
-Using cmake, you can enable CUDA support right in your project's declaration:
+When using the GPU parts of RAFT, you will need to enable CUDA support in your CMake project declaration:
 ```cmake
 project(YOUR_PROJECT VERSION 0.1 LANGUAGES CXX CUDA)
 ```
 
-Please note that some additional compiler flags might need to be added when building against RAFT. For example, if you see an error like this `The experimental flag '--expt-relaxed-constexpr' can be used to allow this.`. The necessary flags can be set with cmake:
+Note that some additional compiler flags might need to be added when building against RAFT. For example, if you see an error like this `The experimental flag '--expt-relaxed-constexpr' can be used to allow this.`. The necessary flags can be set with CMake:
 ```cmake
 target_compile_options(your_target_name PRIVATE $<$<COMPILE_LANGUAGE:CUDA>:--expt-extended-lambda --expt-relaxed-constexpr>)
 ```
@@ -256,95 +305,14 @@ PROPERTIES CXX_STANDARD                        17
            INTERFACE_POSITION_INDEPENDENT_CODE ON)
 ```
 
+The [C++ example template project](https://github.com/rapidsai/raft/tree/HEAD/cpp/template) provides an end-to-end buildable example of what a `CMakeLists.txt` that uses RAFT should look like. The items below point out some of the needed details.
 
-### C++ header-only integration (without cmake)
-
-While not a highly suggested method for building against RAFT, when all of the needed [build dependencies](#build-dependencies) are already satisfied, RAFT can be integrated into downstream projects by cloning the repository and adding `cpp/include` from RAFT to the include path:
-```cmake
-set(RAFT_GIT_DIR ${CMAKE_CURRENT_BINARY_DIR}/raft CACHE STRING "Path to RAFT repo")
-ExternalProject_Add(raft
-  GIT_REPOSITORY    git@github.com:rapidsai/raft.git
-  GIT_TAG           branch-23.12
-  PREFIX            ${RAFT_GIT_DIR}
-  CONFIGURE_COMMAND ""
-  BUILD_COMMAND     ""
-  INSTALL_COMMAND   "")
-set(RAFT_INCLUDE_DIR ${RAFT_GIT_DIR}/raft/cpp/include CACHE STRING "RAFT include variable")
-```
-### C++ header-only integration (with cmake)
-
-
-When using cmake, you can install RAFT headers into your environment with `./build.sh libraft`. 
-
-If the RAFT headers have already been installed into your environment with cmake or through conda, such as by using the `build.sh` script, use `find_package(raft)` and the `raft::raft` target.
-
-### Using C++ pre-compiled shared libraries
-
-Use `find_package(raft COMPONENTS compiled distributed)` to enable the shared library and transitively pass dependencies through separate targets for each component. In this example, the `raft::compiled` and `raft::distributed` targets will be available for configuring linking paths in addition to `raft::raft`. These targets will also pass through any transitive dependencies (such as NCCL for the `distributed` component).
-
-The pre-compiled libraries contain template instantiations for commonly used types, such as single- and double-precision floating-point. By default, these are used automatically when the `RAFT_COMPILED` macro is defined during compilation. This definition is automatically added by CMake.
-
-### Building RAFT C++ from source in cmake
-
-RAFT uses the [RAPIDS-CMake](https://github.com/rapidsai/rapids-cmake) library so it can be more easily included into downstream projects. RAPIDS cmake provides a convenience layer around the [CMake Package Manager (CPM)](https://github.com/cpm-cmake/CPM.cmake).
-
-The following example is similar to invoking `find_package(raft)` but uses `rapids_cpm_find`, which provides a richer and more flexible configuration landscape by using CPM to fetch any dependencies not already available to the build. The `raft::raft` link target will be made available and it's recommended that it be used as a `PRIVATE` link dependency in downstream projects. The `COMPILE_LIBRARY` option enables the building the shared libraries.
-
-The following `cmake` snippet enables a flexible configuration of RAFT:
-
-```cmake
-
-set(RAFT_VERSION "23.12")
-set(RAFT_FORK "rapidsai")
-set(RAFT_PINNED_TAG "branch-${RAFT_VERSION}")
-
-function(find_and_configure_raft)
-  set(oneValueArgs VERSION FORK PINNED_TAG COMPILE_LIBRARY)
-  cmake_parse_arguments(PKG "${options}" "${oneValueArgs}"
-                            "${multiValueArgs}" ${ARGN} )
-  
-  #-----------------------------------------------------
-  # Invoke CPM find_package()
-  #-----------------------------------------------------
-
-  rapids_cpm_find(raft ${PKG_VERSION}
-          GLOBAL_TARGETS      raft::raft
-          BUILD_EXPORT_SET    projname-exports
-          INSTALL_EXPORT_SET  projname-exports
-          CPM_ARGS
-          GIT_REPOSITORY https://github.com/${PKG_FORK}/raft.git
-          GIT_TAG        ${PKG_PINNED_TAG}
-          SOURCE_SUBDIR  cpp
-          FIND_PACKAGE_ARGUMENTS "COMPONENTS compiled distributed"
-          OPTIONS
-          "BUILD_TESTS OFF"
-          "BUILD_PRIMS_BENCH OFF"
-          "BUILD_ANN_BENCH OFF"
-          "RAFT_COMPILE_LIBRARY ${PKG_COMPILE_LIBRARY}"
-  )
-
-endfunction()
-
-# Change pinned tag here to test a commit in CI
-# To use a different RAFT locally, set the CMake variable
-# CPM_raft_SOURCE=/path/to/local/raft
-find_and_configure_raft(VERSION    ${RAFT_VERSION}.00
-        FORK             ${RAFT_FORK}
-        PINNED_TAG       ${RAFT_PINNED_TAG}
-        COMPILE_LIBRARY          NO
-)
-```
-
-You can find a fully-functioning [example template project](../../cpp/template/README.md) in the `cpp/template` directory, which provides everything you need to build a new application with RAFT or incorporate RAFT Into your existing libraries.
-
-## Uninstall
+#### CMake Targets
 
-Once built and installed, RAFT can be safely uninstalled using `build.sh` by specifying any or all of the installed components. Please note that since `pylibraft` depends on `libraft`, uninstalling `pylibraft` will also uninstall `libraft`:
-```bash
-./build.sh libraft pylibraft raft-dask --uninstall
-```
+The `raft::raft` CMake target is made available when including RAFT into your CMake project but additional CMake targets can be made available by adding to the `COMPONENTS` option in CMake's `find_package(raft)` (refer to [CMake docs](https://cmake.org/cmake/help/latest/command/find_package.html#basic-signature) to learn more). The components should be separated by spaces. The `raft::raft` target will always be available. Note that the `distributed` component also exports additional dependencies.
 
-Leaving off the installed components will uninstall everything that's been installed:
-```bash
-./build.sh --uninstall
-```
+| Component   | Target              | Description                                              | Base Dependencies                      |
+|-------------|---------------------|----------------------------------------------------------|----------------------------------------|
+| n/a         | `raft::raft`        | Full RAFT header library                                 | CUDA toolkit, RMM, NVTX, CCCL, CUTLASS |
+| compiled    | `raft::compiled`    | Pre-compiled template instantiations and runtime library | raft::raft                             |
+| distributed | `raft::distributed` | Dependencies for `raft::comms` APIs                      | raft::raft, UCX, NCCL         
\ No newline at end of file
diff --git a/docs/source/cpp_api/cluster.rst b/docs/source/cpp_api/cluster.rst
index 77c8332bbd..b0485992b3 100644
--- a/docs/source/cpp_api/cluster.rst
+++ b/docs/source/cpp_api/cluster.rst
@@ -13,5 +13,6 @@ fundamental clustering algorithms which are, themselves, considered reusable bui
    :caption: Contents:
 
    cluster_kmeans.rst
+   cluster_kmeans_balanced.rst
    cluster_slhc.rst
    cluster_spectral.rst
\ No newline at end of file
diff --git a/docs/source/cpp_api/cluster_kmeans_balanced.rst b/docs/source/cpp_api/cluster_kmeans_balanced.rst
new file mode 100644
index 0000000000..5d07fcc1e3
--- /dev/null
+++ b/docs/source/cpp_api/cluster_kmeans_balanced.rst
@@ -0,0 +1,13 @@
+K-Means
+=======
+
+.. role:: py(code)
+   :language: c++
+   :class: highlight
+
+``#include <raft/cluster/kmeans_balanced.cuh>``
+
+.. doxygennamespace:: raft::cluster::kmeans_balanced
+    :project: RAFT
+    :members:
+    :content-only:
diff --git a/docs/source/pylibraft_api.rst b/docs/source/pylibraft_api.rst
index 84955283cb..df25b76985 100644
--- a/docs/source/pylibraft_api.rst
+++ b/docs/source/pylibraft_api.rst
@@ -10,5 +10,6 @@ Python API
    pylibraft_api/cluster.rst
    pylibraft_api/common.rst
    pylibraft_api/distance.rst
+   pylibraft_api/matrix.rst
    pylibraft_api/neighbors.rst
    pylibraft_api/random.rst
diff --git a/docs/source/pylibraft_api/matrix.rst b/docs/source/pylibraft_api/matrix.rst
new file mode 100644
index 0000000000..884a466ec1
--- /dev/null
+++ b/docs/source/pylibraft_api/matrix.rst
@@ -0,0 +1,11 @@
+Matrix
+======
+
+This page provides `pylibraft` class references for the publicly-exposed elements of the `pylibraft.matrix` package.
+
+
+.. role:: py(code)
+   :language: python
+   :class: highlight
+
+.. autofunction:: pylibraft.matrix.select_k
diff --git a/docs/source/raft_ann_benchmarks.md b/docs/source/raft_ann_benchmarks.md
index 2e8572c299..64a51550c4 100644
--- a/docs/source/raft_ann_benchmarks.md
+++ b/docs/source/raft_ann_benchmarks.md
@@ -89,12 +89,12 @@ We provide images for GPU enabled systems, as well as systems without a GPU. The
 - `raft-ann-bench-datasets`: Contains the GPU and CPU benchmarks with million-scale datasets already included in the container. Best suited for users that want to run multiple million scale datasets already included in the image.
 - `raft-ann-bench-cpu`: Contains only CPU benchmarks with minimal size. Best suited for users that want the smallest containers to reproduce benchmarks on systems without a GPU.
 
-Nightly images are located in [dockerhub](https://hub.docker.com/r/rapidsai/raft-ann-bench), meanwhile release (stable) versions are located in [NGC](https://hub.docker.com/r/rapidsai/raft-ann-bench), starting with release 23.10.
+Nightly images are located in [dockerhub](https://hub.docker.com/r/rapidsai/raft-ann-bench/tags), meanwhile release (stable) versions are located in [NGC](https://hub.docker.com/r/rapidsai/raft-ann-bench), starting with release 23.12.
 
 - The following command pulls the nightly container for python version 10, cuda version 12, and RAFT version 23.10:
 
 ```bash
-docker pull rapidsai/raft-ann-bench:23.10a-cuda12.0-py3.10 #substitute raft-ann-bench for the exact desired container.
+docker pull rapidsai/raft-ann-bench:23.12a-cuda12.0-py3.10 #substitute raft-ann-bench for the exact desired container.
 ```
 
 The CUDA and python versions can be changed for the supported values:
@@ -113,7 +113,7 @@ You can see the exact versions as well in the dockerhub site:
 -  The following command (only available after RAPIDS 23.10 release) pulls the container:
 
 ```bash
-docker pull nvcr.io/nvidia/rapidsai/raft-ann-bench:23.08-cuda11.8-py3.10 #substitute raft-ann-bench for the exact desired container.
+docker pull nvcr.io/nvidia/rapidsai/raft-ann-bench:23.12-cuda11.8-py3.10 #substitute raft-ann-bench for the exact desired container.
 ```
 
 ### Container Usage
@@ -127,8 +127,8 @@ For GPU systems, where `$DATA_FOLDER` is a local folder where you want datasets
 ```bash
 export DATA_FOLDER=path/to/store/datasets/and/results
 docker run --gpus all --rm -it -u $(id -u) \
-    -v $DATA_FOLDER:/home/rapids/benchmarks  \
-    rapidsai/raft-ann-bench:23.10a-cuda11.8-py3.10 \
+    -v $DATA_FOLDER:/data/benchmarks \
+    rapidsai/raft-ann-bench:23.12a-cuda11.8-py3.10 \
     "--dataset deep-image-96-angular" \
     "--normalize" \
     "--algorithms raft_cagra,raft_ivf_pq" \
@@ -140,26 +140,25 @@ Where:
 ```bash
 export DATA_FOLDER=path/to/store/datasets/and/results # <- local folder to store datasets and results
 docker run --gpus all --rm -it -u $(id -u) \
-    -v $DATA_FOLDER:/home/rapids/benchmarks  \
-    rapidsai/raft-ann-bench:23.10a-cuda11.8-py3.10 \ # <- image to use, either `raft-ann-bench` or `raft-ann-bench-datasets`, can choose RAPIDS, cuda and python versions.
+    -v $DATA_FOLDER:/data/benchmarks  \
+    rapidsai/raft-ann-bench:23.12a-cuda11.8-py3.10 \ # <- image to use, either `raft-ann-bench` or `raft-ann-bench-datasets`, can choose RAPIDS, cuda and python versions.
     "--dataset deep-image-96-angular" \ # <- dataset name
     "--normalize" \ # <- whether to normalize the dataset, leave string empty ("") to not normalize.
     "--algorithms raft_cagra" \ # <- what algorithm(s) to use as a ; separated list, as well as any other argument to pass to `raft_ann_benchmarks.run`
     "" # optional arguments to pass to `raft_ann_benchmarks.plot`
 ```
 
-*** Note about user and file permissions: *** The flag `-u $(id -u)` allows the user inside the container to match the `uid` of the user outside the container, allowing the container to read and write to the mounted volume indicated by $DATA_FOLDER.
+*** Note about user and file permissions: *** The flag `-u $(id -u)` allows the user inside the container to match the `uid` of the user outside the container, allowing the container to read and write to the mounted volume indicated by the `$DATA_FOLDER` variable.
 
-For CPU systems the same interface applies, except for not needing the gpus argument and using the cpu images:
+The same interface applies to systems that don't have a GPU installed, except we use the `raft-ann-bench-cpu` container and the `--gpus all` argument is no longer used:
 ```bash
 export DATA_FOLDER=path/to/store/datasets/and/results
-docker run  all --rm -it -u $(id -u) \
-    -v $DATA_FOLDER:/home/rapids/benchmarks  \
-    rapidsai/raft-ann-bench-cpu:23.10a-py3.10 \
+docker run  --rm -it -u $(id -u) \
+    -v $DATA_FOLDER:/data/benchmarks  \
+    rapidsai/raft-ann-bench-cpu:23.12a-py3.10 \
      "--dataset deep-image-96-angular" \
      "--normalize" \
-     "--algorithms raft_cagra" \
-     ""
+     "--algorithms hnswlib"
 ```
 
 **Note:** The user inside the containers is `root`. To workaround this, the scripts in the containers fix the user of the output files after the benchmarks are run. If the benchmarks are interrupted, the owner of the `datasets/results` produced by the container will be wrong, and will need to be manually fixed by the user.
@@ -169,12 +168,13 @@ docker run  all --rm -it -u $(id -u) \
 ```bash
 export DATA_FOLDER=path/to/store/datasets/and/results
 docker run --gpus all --rm -it -u $(id -u) \
-    -v $DATA_FOLDER:/home/rapids/benchmarks  \
-    rapidsai/raft-ann-bench:23.10a-cuda11.8-py3.10 \
-    --entrypoint /bin/bash
+    --entrypoint /bin/bash \
+    --workdir /data/benchmarks \
+    -v $DATA_FOLDER:/data/benchmarks  \
+    rapidsai/raft-ann-bench:23.12a-cuda11.8-py3.10 
 ```
 
-This will drop you into a command line in the container, with the `raft_ann_benchmarks` python package ready to use, as was described in the prior [conda section](#conda):
+This will drop you into a command line in the container, with the `raft-ann-bench` python package ready to use, as described in the [conda section](#conda) above:
 
 ```
 (base) root@00b068fbb862:/home/rapids#