Skip to content

Commit

Permalink
Add Documentation for k-NN Faiss SQFP16
Browse files Browse the repository at this point in the history
Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
  • Loading branch information
naveentatikonda committed Mar 18, 2024
1 parent a3f0646 commit 3541335
Show file tree
Hide file tree
Showing 2 changed files with 168 additions and 4 deletions.
68 changes: 64 additions & 4 deletions _search-plugins/knn/knn-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,16 @@ The k-NN plugin introduces a custom data type, the `knn_vector`, that allows use

Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount of storage space needed. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector).

## SIMD Optimization for Faiss

Starting with k-NN plugin version 2.13, [SIMD(Single instruction, multiple data)](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) is supported by default on Linux machines only for Faiss engine if the underlying processor on the system supports SIMD instructions (`AVX2` on `x64` architecture and `NEON` on `ARM64` architecture) which helps to boost the overall performance.
For x64 architecture, two different versions of faiss library(`libopensearchknn_faiss.so` and `libopensearchknn_faiss_avx2.so`) are built and shipped with the artifact where the library with `_avx2` suffix has the AVX2 SIMD instructions. During runtime, detects if the underlying system supports AVX2 or not and loads the corresponding library.

Users can override and disable AVX2 and load the default faiss library(`libopensearchknn_faiss.so`) even if system supports avx2 by setting `knn.faiss.avx2.disabled`(Static) to `true` in opensearch.yml (which is by default `false`).
{: .note}

For arm64 architecture, only one faiss library(`libopensearchknn_faiss.so`) is built and shipped which contains the NEON SIMD instructions and unlike avx2, it can't be disabled.

## Method definitions

A method definition refers to the underlying configuration of the Approximate k-NN algorithm you want to use. Method definitions are used to either create a `knn_vector` field (when the method does not require training) or [create a model during training]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-model) that can then be used to [create a `knn_vector` field]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model).
Expand Down Expand Up @@ -45,7 +55,7 @@ Parameter name | Required | Default | Updatable | Description
For nmslib, *ef_search* is set in the [index settings](#index-settings).
{: .note}

### Supported faiss methods
### Supported Faiss methods

Method name | Requires training | Supported spaces | Description
:--- | :--- | :--- | :---
Expand Down Expand Up @@ -113,10 +123,10 @@ Lucene HNSW implementation ignores `ef_search` and dynamically sets it to the v
}
```

### Supported faiss encoders
### Supported Faiss encoders

You can use encoders to reduce the memory footprint of a k-NN index at the expense of search accuracy. faiss has
several encoder types, but the plugin currently only supports *flat* and *pq* encoding.
You can use encoders to reduce the memory footprint of a k-NN index at the expense of search accuracy. Faiss has
several encoder types, but the plugin currently only supports *flat*, *pq*, and *sq* encoding.

The following example method definition specifies the `hnsw` method and a `pq` encoder:

Expand Down Expand Up @@ -144,6 +154,7 @@ Encoder name | Requires training | Description
:--- | :--- | :---
`flat` | false | Encode vectors as floating point arrays. This encoding does not reduce memory footprint.
`pq` | true | An abbreviation for _product quantization_, it is a lossy compression technique that uses clustering to encode a vector into a fixed size of bytes, with the goal of minimizing the drop in k-NN search accuracy. At a high level, vectors are broken up into `m` subvectors, and then each subvector is represented by a `code_size` code obtained from a code book produced during training. For more information about product quantization, see [this blog post](https://medium.com/dotstar/understanding-faiss-part-2-79d90b1e5388).
`sq` | false | sq stands for Scalar Quantization. Starting with k-NN plugin version 2.13, you can use the sq encoder(by default [SQFP16]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization#faiss-sqfp16)) to quantize 32-bit floating-point vectors into 16-bit floats by using the built-in Faiss ScalarQuantizer in order to reduce the memory footprint with a minimal loss of precision. Besides optimizing memory use, sq improves the overall performance with the SIMD optimization (using `AVX2` on `x86` architecture and using `NEON` on `ARM` architecture).

#### Examples

Expand Down Expand Up @@ -195,13 +206,62 @@ The following example uses the `hnsw` method without specifying an encoder (by d
}
```

The following example uses the `hnsw` method with a `sq` encoder of type `fp16` with `clip` enabled:

```json
"method": {
"name":"hnsw",
"engine":"faiss",
"space_type": "l2",
"parameters":{
"encoder": {
"name": "sq",
"parameters": {
"type": "fp16",
"clip": true
}
},
"ef_construction": 256,
"m": 8
}
}
```

The following example uses the `ivf` method with a `sq` encoder of type `fp16`:

```json
"method": {
"name":"ivf",
"engine":"faiss",
"space_type": "l2",
"parameters":{
"encoder": {
"name": "sq",
"parameters": {
"type": "fp16",
"clip": false
}
},
"nprobes": 2
}
}
```


#### PQ parameters

Paramater Name | Required | Default | Updatable | Description
:--- | :--- | :--- | :--- | :---
`m` | false | 1 | false | Determine how many many sub-vectors to break the vector into. sub-vectors are encoded independently of each other. This dimension of the vector must be divisible by `m`. Max value is 1024.
`code_size` | false | 8 | false | Determines the number of bits to encode a sub-vector into. Max value is 8. **Note** --- for IVF, this value must be less than or equal to 8. For HNSW, this value can only be 8.

#### SQ parameters

Paramater Name | Required | Default | Updatable | Description
:--- | :--- | :-- | :--- | :---
`type` | false | fp16 | false | Determines the type of scalar quantization to be used to encode the 32 bit float vectors into the corresponding type. By default, it is `fp16`.
`clip` | false | false | false | When set to `true`, clips the vectors that are outside of the range to bring them into the range.

### Choosing the right method

There are a lot of options to choose from when building your `knn_vector` field. To determine the correct methods and parameters to choose, you should first understand what requirements you have for your workload and what trade-offs you are willing to make. Factors to consider are (1) query latency, (2) query quality, (3) memory limits, (4) indexing latency.
Expand Down
104 changes: 104 additions & 0 deletions _search-plugins/knn/knn-vector-quantization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
---
layout: default
title: k-NN vector quantization
nav_order: 50
parent: k-NN search
grand_parent: Search methods
has_children: false
has_math: true
---

# k-NN vector quantization

The OpenSearch k-NN plugin by default supports the indexing and querying of vectors of type float where each dimension of the vector occupies 4 bytes of memory. This is getting expensive in terms of memory for use cases that requires ingestion on a large scale where we need to construct, load, save and search graphs(for native engines nmslib and faiss) which is getting even more costlier.
To reduce these memory footprints, we can use these vector quantization features supported by k-NN plugin.

## Lucene byte vector

Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount of memory needed. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector).

## Faiss SQfp16

Starting with k-NN plugin version 2.13, users can ingest `fp16` vectors with `faiss` engine where when user provides the 32 bit float vectors, the Faiss engine quantizes the vector into FP16 using their scalar quantizer (users don’t need to do any quantization on their end), stores it and decodes it back to FP32 for distance computation during search operations. Using this feature, users can
reduce memory footprints by a factor of 2, significant reduction in search latencies (with [SIMD Optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-faiss)), with a very minimal loss in recall(depends on distribution of vectors).

To use this feature, users needs to set `encoder` name as `sq` and to know the type of quantization in SQ, we are introducing a new optional field, `type` in the encoder parameters. The data indexed by users should be within the FP16 range of [-65504.0, 65504.0]. If the data lies out of this range then an exception is thrown and the request is rejected.

We also introduced another optional encoder parameter `clip` and if this is set to `true`(by default `false`) in the index mapping, then if the data lies out of FP16 range it will be clipped to the MIN(`-65504.0`) and MAX(`65504.0`) of FP16 range and ingested into the index without throwing any exception. But, clipping the values might cause a drop in recall.

For Example - when `clip` is set to `true`, `65510.82` will be clipped and indexed as `65504.0` and `-65504.1` will be clipped and indexed as `-65504.0`.

Ideally, `clip` parameter is recommended to be set as `true` only when most of the vector elements are within the fp16 range and very few elements lies outside of the range.
{: .note}

* `type` - Set this as `fp16` if we want to quantize the indexed vectors into fp16 using Faiss SQFP16; Default value is `fp16`.
* `clip` - Set this as `true` if you want to skip the FP16 validation check and clip vector value to bring it into FP16 MIN or MAX range; Default value is `false`.

This is an example of a method definition using Faiss SQfp16 with `clip` as `true`
```json
"method": {
"name":"hnsw",
"engine":"faiss",
"space_type": "l2",
"parameters":{
"encoder":{
"name":"sq",
"parameters":{
"type": "fp16",
"clip": true
}
}
}
}

```

During ingestion, make sure each dimension of the vector is in the supported range [-65504.0, 65504.0] if `clip` is set as `false`:
```json
PUT test-index/_doc/1
{
"my_vector1": [-65504.0, 65503.845, 55.82]
}
```

During querying, there is no range limitation for query vector:
```json
GET test-index/_search
{
"size": 2,
"query": {
"knn": {
"my_vector1": {
"vector": [265436.876, -120906.256, 99.84],
"k": 2
}
}
}
}
```

### Memory estimation

Ideally, Faiss SQfp16 requires 50% of the memory consumed by FP32 vectors.

#### HNSW memory estimation

The memory required for HNSW is estimated to be `1.1 * (2 * dimension + 8 * M)` bytes/vector.

As an example, assume you have a million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows:

```
1.1 * (2 * 256 + 8 * 16) * 1,000,000 ~= 0.656 GB
```

#### IVF memory estimation

The memory required for IVF is estimated to be `1.1 * (((2 * dimension) * num_vectors) + (4 * nlist * d))` bytes.

As an example, assume you have a million vectors with a dimension of 256 and nlist of 128. The memory requirement can be estimated as follows:

```
1.1 * (((2 * 256) * 1,000,000) + (4 * 128 * 256)) ~= 0.525 GB
```

0 comments on commit 3541335

Please sign in to comment.