Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Vector Operations #3068

Closed
Btibert3 opened this issue Mar 16, 2024 · 9 comments · Fixed by #3087
Closed

Feature Request: Vector Operations #3068

Btibert3 opened this issue Mar 16, 2024 · 9 comments · Fixed by #3087
Assignees
Labels
feature New features or missing components of existing features good-warm-up Good warm up feature high-priority

Comments

@Btibert3
Copy link

I just came across this project, and wow I am impressed. Currently, Neo4j supports vector operations, more specifically, similarity calculations. It would be great if we could extend the concept of fixed-length lists and perform similarity operations. Maybe that support already exists and I am overlooking how to achieve this with your stack, but this would be a great feature to help support RAG operations.

@semihsalihoglu-uw
Copy link
Contributor

Thanks for raising this. I agree that we should support some of the common functions. We can follow DuckDB's array functions: https://duckdb.org/docs/sql/data_types/array.html#functions. Arrays in DuckDB are equivalent to our fixed-length list type, so I don't think there is a hurdle in supporting these.

I'll put this into our pipeline.

@semihsalihoglu-uw semihsalihoglu-uw added feature New features or missing components of existing features good-warm-up Good warm up feature high-priority labels Mar 16, 2024
@prrao87
Copy link
Member

prrao87 commented Mar 16, 2024

Hi @Btibert3 that makes a lot of sense. Could you elaborate a bit on what the intended use case is in the context of RAG? To you, how would the ideal implementation look from a graph query perspective?

@Btibert3
Copy link
Author

@semihsalihoglu-uw DuckDB is exactly what I had in my head.

@prrao87 Naive RAG lets you find entries (i.e. nodes) based on the similarity of the vector to the input query. We can go beyond this by further restricting the results by leveraging graph patterns. One example might be to show a list of products based on the user's input query (vector similarity) but further restrict/re-rank the results based on products the user hasn't purchased and behavior of other "similar" users, where similarity in this context is leveraging graph relationships. In this example, the results come from vector-based similarity and graph relationships.

Another example would be to consider the most similar document chunk via vector search, but improving context windows based on linked nodes and variable pattern matching, again using vector similarity but also the structure of the relationships in the graph.

@acquamarin
Copy link
Collaborator

@Btibert3 May i know which vector operations you are most interested in? So we can implement those in advance.

@Btibert3
Copy link
Author

Sure thing.

In short I believe that you can go pretty far with those three.

@prrao87
Copy link
Member

prrao87 commented Mar 18, 2024

@acquamarin I'd start with cosine and then extend to Euclidean (L2) and then finally dot product, in that order. Cosine seems to be the most common metric used for similarity search in general.

@Btibert3
Copy link
Author

If those are the two being considered out of the gate, I completely agree with cosine.

@Btibert3
Copy link
Author

Wow! Very impressed.

@hpvd
Copy link

hpvd commented Mar 22, 2024

one week from request to implementation? Just unbelievable :-D
Many thanks for your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New features or missing components of existing features good-warm-up Good warm up feature high-priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants