[BUG] Regression in cohere-10m force merge latency after switching to NativeEngines990KnnVectorsWriter #2134

shatejas · 2024-09-20T19:16:17Z

What is the bug?

After the switch to NativeEngines990KnnVectorsWriter we saw force merge latencies increased approximately by 20% in nightly runs

The increase has been consistent

How can one reproduce the bug?
Running a benchmark for force-merge against 2.17 vs 2.16 with cohere 10m dataset takes 2000seconds (~30 mins) more

What is the expected behavior?
It should take the same time or less

What is your host/environment?
Nightlies dashboard

Do you have any screenshots?
NA

Do you have any additional context?
NA

The text was updated successfully, but these errors were encountered:

shatejas · 2024-09-20T19:18:46Z

To reproduce easily cohere 1m dataset was used for benchmarking for the below table

	number of index segments	force merge time (minutes)	force merge segments
2.17 code	75	15.68263	3
#2133 code	92	12.51252	3
2.16 code	88	15.68209	3

Estimated bottlenecks

KNNVectorValues Creation

KNNVectorValues are created 3 times currently, we cannot reuse the same object as there is no way we could reset the iterator and putting effort into logic for resetting the iterator might not result in latency improvements

Computing totalLiveDocs
Training for quantization
Building index

Currently we are creating KNNVectorValues when quantization is not needed. Exp 2 in the above table shows some improvement in force merge time

TotalLiveDocs computes

There is a linear time complexity to compute total live docs. TotalLiveDocs value is currently needed to

Mean calculations during quantization training
Memory allocation computations while building graph for HNSW

Flush case

For flush we can avoid this calculation as there are no deleted docs involved and we can rely on KNNVectorValues or vectors in the field to give us the right result for totalLiveDocs

Merge case

Merge involves removing deleted docs, While merging the segments the deleted docs aren’t considered. To do that current code path is using APIs in MergedVectorValues to have an iterator that can iterate while skipping the deleted docs. The APIs here does not give an iterator which considered deleted docs in its size count. As a result even KNNVectorValues cannot return the right result as it relies on the iterator provided by the MergedVectorValues to compute total live docs

navneet1v · 2024-09-20T19:50:30Z

@shatejas one way to avoid the linear complexity for totalLives does when there are no deleted docs is we can write our custom FloatVectorValues merger. We already have something like this in BinaryDocValues: ref: https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesReader.java#L38-L64

If we do this, we can remove the complexity for total live docs. We can also add some clone interfaces on those merged values which will help us remove complexity from code.

shatejas added bug Something isn't working untriaged labels Sep 20, 2024

navneet1v assigned shatejas Sep 20, 2024

shatejas mentioned this issue Sep 20, 2024

Makes sure KNNVectorValues aren't recreated unnecessarily when #2133

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Regression in cohere-10m force merge latency after switching to NativeEngines990KnnVectorsWriter #2134

[BUG] Regression in cohere-10m force merge latency after switching to NativeEngines990KnnVectorsWriter #2134

shatejas commented Sep 20, 2024

shatejas commented Sep 20, 2024 •

edited

Loading

navneet1v commented Sep 20, 2024

[BUG] Regression in cohere-10m force merge latency after switching to NativeEngines990KnnVectorsWriter #2134

[BUG] Regression in cohere-10m force merge latency after switching to NativeEngines990KnnVectorsWriter #2134

Comments

shatejas commented Sep 20, 2024

shatejas commented Sep 20, 2024 • edited Loading

Estimated bottlenecks

KNNVectorValues Creation

TotalLiveDocs computes

Flush case

Merge case

navneet1v commented Sep 20, 2024

shatejas commented Sep 20, 2024 •

edited

Loading