-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Neo4j vector context retriever bug fix #14461
Neo4j vector context retriever bug fix #14461
Conversation
if limit <= 0: | ||
break | ||
# Needs some optimization | ||
response = self.structured_query( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we are making a db query per id -- this is extremely inefficient
Can't we use the "power of cypher" to force some ordering of the nodes to match the ordering of the IDs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I am not a cypher expert, but it seems like a single query should be possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've modified the code to only make one call to the DB and force ordering of the IDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please review the new query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Description
Previously, when using the
VectorContextRetriever
with theNeo4jPropertyGraphStore
, theVectorContextRetriever
would callavector_query
to get thesimilarity_top_k
nodes and return the list of nodes with the first node being ranked with the highest similarity score, and the last being least similar. However, when we pass the nodes found fromavector_query
toaget_rel_map
, the nodes returned could be in any order of similarity. So we need to fix this issue. For example, if I have a query:nodes = retriever.retrieve("Andres Mendez")
, then then resulting node ids fromavector_query
could look like the following: ['Andres Mendez', 'Andres', 'Miguel Mendez', 'Joe Mamma']. But then the results fromaget_rel_map
could provide only the top 30 results from the relationships of Joe Mamma. This is not expected behaviour, as we are querying for Andres Mendez, not Joe Mamma. The issue here is that we don't take similarity score into account when when calling aget_rel_map, but we need to.Fixes # (issue)
I modified the cypher query in
aget_rel_map
to return results based on the order of the node ids passed into this function. This will preserve the scores of the input nodes in the response of this function.Version Bump?
Did I bump the version in the
pyproject.toml
file of the package I am updating? (Except for thellama-index-core
package)Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
I also tested this by running different cases to make sure this works as expected.
Suggested Checklist: