Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Get vectors orchestrator #2348

Merged
merged 17 commits into from
Jun 17, 2024
Merged

[ENH] Get vectors orchestrator #2348

merged 17 commits into from
Jun 17, 2024

Conversation

HammadB
Copy link
Collaborator

@HammadB HammadB commented Jun 15, 2024

This PR adds the Get Vectors() RPC for query nodes.

  1. Adds the GetVectors instrumented rpc call and associated types
  2. Adds a GetVectorsOrchestrator for managing these queries
  3. Adds the GetVectorsOperator for reading from the log and record segment to respond to get_vectors
  4. Moves common orchestrator functions into a /common and refactors hnsw orchestrator to use this instead (DRY)
  5. Adds the ChromaError Trait for ChannelError in Sender - needed to use that as Box elsewhere.

I explicitly make the choice to not have to materialize in this operator since it doesn't need to for any reason.

Tasks

  • Stub out
  • Types
  • Implement return
  • Server input parse
  • Server output parse
  • Init segments
  • Pull log + filter
  • Operator for reading
  • Add test (Will handle in separate PR)
  • Validate that the get() semantics for missing ids is correct (I have validated that the semantics are the same - we just ignore invalid entries in the output)

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@HammadB HammadB changed the title [ENH] [WIP] Get vectors orchestrator [ENH] Get vectors orchestrator Jun 16, 2024
@HammadB HammadB requested a review from sanketkedia June 16, 2024 02:07
@HammadB HammadB marked this pull request as ready for review June 16, 2024 02:07
@HammadB HammadB requested a review from sanketkedia June 16, 2024 04:08
Copy link
Contributor

@sanketkedia sanketkedia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good pending the cleanup

@HammadB HammadB enabled auto-merge (squash) June 17, 2024 20:09
@HammadB HammadB merged commit 09df9ce into main Jun 17, 2024
63 checks passed
Anush008 pushed a commit to Anush008/chroma that referenced this pull request Jun 27, 2024
This PR adds the Get Vectors() RPC for query nodes.

1. Adds the GetVectors instrumented rpc call and associated types
2. Adds a GetVectorsOrchestrator for managing these queries
3. Adds the GetVectorsOperator for reading from the log and record
segment to respond to get_vectors
4. Moves common orchestrator functions into a /common and refactors hnsw
orchestrator to use this instead (DRY)
5. Adds the `ChromaError` Trait for ChannelError in Sender - needed to
use that as Box<dyn ChromaError> elsewhere.

I explicitly make the choice to not have to materialize in this operator
since it doesn't need to for any reason.

Tasks
- [x] Stub out
- [x] Types
- [x] Implement return
- [x] Server input parse
- [x] Server output parse
- [x] Init segments
- [x] Pull log + filter
- [x] Operator for reading
- [x] Add test (Will handle in separate PR) 
- [x] Validate that the get() semantics for missing ids is correct (I
have validated that the semantics are the same - we just ignore invalid
entries in the output)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants