Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distance metrics integration #422

Merged
merged 26 commits into from
Jul 25, 2024
Merged

Distance metrics integration #422

merged 26 commits into from
Jul 25, 2024

Conversation

cainamisir
Copy link
Contributor

@cainamisir cainamisir commented Jun 19, 2024

Added two new distance metrics to the FLAT index.

  • Inner Product
  • Cosine Distance

Usage: in create() or ingest(), distance_metric can be specified like this:
distance_metric=vspy.DistanceMetric.COSINE
or
distance_metric=vspy.DistanceMetric.INNER_PRODUCT

Default distance metric is L2.

Note: the Inner Product distance metric returns -dot(a,b) as to provide the vectors in the order indicating highest similarity, without changing the paradigm that "distances" are returned in increasing order.

@NikolaosPapailiou
Copy link
Collaborator

Please add a bit more context for this change in the PR description and link to the respective shortcut issue.

apis/python/test/test_distance_metrics.py Outdated Show resolved Hide resolved
apis/python/test/test_distance_metrics.py Outdated Show resolved Hide resolved
apis/python/test/test_distance_metrics.py Outdated Show resolved Hide resolved
apis/python/src/tiledb/vector_search/ingestion.py Outdated Show resolved Hide resolved
apis/python/src/tiledb/vector_search/flat_index.py Outdated Show resolved Hide resolved
apis/python/src/tiledb/vector_search/ivf_pq_index.py Outdated Show resolved Hide resolved
apis/python/test/test_distance_metrics.py Outdated Show resolved Hide resolved
apis/python/test/test_distance_metrics.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@jparismorgan jparismorgan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also add C++ unit tests for this? Specifically that we can set and get the metadata correctly, both during index creation and after writing to a URI. Note that you can just add checks to existing tests, you probably don't need totally new test cases here.

apis/python/src/tiledb/vector_search/index.py Outdated Show resolved Hide resolved
apis/python/src/tiledb/vector_search/index.py Outdated Show resolved Hide resolved
apis/python/src/tiledb/vector_search/ingestion.py Outdated Show resolved Hide resolved
apis/python/src/tiledb/vector_search/module.cc Outdated Show resolved Hide resolved
src/include/api/vamana_index.h Outdated Show resolved Hide resolved
src/include/index/ivf_pq_index.h Outdated Show resolved Hide resolved
src/include/index/ivf_pq_index.h Outdated Show resolved Hide resolved
src/include/index/ivf_pq_metadata.h Outdated Show resolved Hide resolved
src/include/scoring.h Outdated Show resolved Hide resolved
@cainamisir cainamisir force-pushed the vlad/distancemetrics branch 2 times, most recently from 0032183 to 53410b6 Compare July 23, 2024 14:35
@cainamisir cainamisir merged commit 5db39ff into main Jul 25, 2024
6 checks passed
@cainamisir cainamisir deleted the vlad/distancemetrics branch July 25, 2024 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants