Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community: retrievers: added capability for using Product Quantization as one of the retriever. #22424

Merged
merged 45 commits into from
Jul 24, 2024

Conversation

Vishnunkumar
Copy link
Contributor

@Vishnunkumar Vishnunkumar commented Jun 3, 2024

  • Community: "Retrievers: Product Quantization"

    • This PR adds Product Quantization feature to the retrievers to the Langchain Community. PQ is one of the fastest retrieval methods if the embeddings are rich enough in context due to the concepts of quantization and representation through centroids
    • Description: Adding PQ as one of the retrievers
    • Dependencies: using the package nanopq for this PR
    • Twitter handle: vishnunkumar_
  • Add tests and docs: If you're adding a new integration, please include

    • Added unit tests for the same in the retrievers.
    • [] Will add an example notebook subsequently
  • Lint and test: Run make format, make lint and make test from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ - done the same

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jun 3, 2024
Copy link

vercel bot commented Jun 3, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 24, 2024 1:52pm

@dosubot dosubot bot added Ɑ: retriever Related to retriever module 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features labels Jun 3, 2024
Copy link
Collaborator

@baskaryan baskaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we add a docs page to docs/docs/integrations/retrievers/?

return np.array(list(executor.map(embeddings.embed_query, contexts)))


class PQRetriever(BaseRetriever):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we call this something like NanoPQRetriever to make clear what dependency is being used?

Copy link
Contributor Author

@Vishnunkumar Vishnunkumar Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Felt like the algorithm's name should be given more importance than the package. Thought similar on the lines of TF-IDF, SVM or KNN. However, let me if package based naming would make sense

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think since this package is less well known/used than sklearn its more important to make it obvious whats being used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, will update on this.

@baskaryan
Copy link
Collaborator

could we also add a docs page to /docs/docs/integrations/retrievers

@Vishnunkumar
Copy link
Contributor Author

could we also add a docs page to /docs/docs/integrations/retrievers

Hi @baskaryan , will be adding the same

@Vishnunkumar
Copy link
Contributor Author

Hi @baskaryan , hope you are well. I have the following queries.

  • I am able to see the following error in the Screenshot 2024-06-18 at 8 26 49 PM, Feel like this can be resolved only after NanoPq gets merged with the mainline, let me know if there is any misunderstanding from my side.
  • Also, any reference to how we can add docs in the API Reference for NanoPQRetreiver.

@ccurme ccurme added the community Related to langchain-community label Jun 18, 2024
@Vishnunkumar
Copy link
Contributor Author

Hi @baskaryan , hope you are well. I have the following queries.

  • I am able to see the following error in the Screenshot 2024-06-18 at 8 26 49 PM, Feel like this can be resolved only after NanoPq gets merged with the mainline, let me know if there is any misunderstanding from my side.
  • Also, any reference to how we can add docs in the API Reference for NanoPQRetreiver.

@baskaryan , @ccurme @efriis . Any suggestion on these. Would love to hear and work on this to get this merged.

@ccurme
Copy link
Collaborator

ccurme commented Jul 5, 2024

Hi @Vishnunkumar, is your local langchain-community installed in your environment? e.g.,

cd /path/to/your/langchain/libs/community/
pip install -e .

@Vishnunkumar
Copy link
Contributor Author

Hi @Vishnunkumar, is your local langchain-community installed in your environment? e.g.,

cd /path/to/your/langchain/libs/community/
pip install -e .

Hi @ccurme , thanks will look into it and update.

@Vishnunkumar
Copy link
Contributor Author

Hi @Vishnunkumar, is your local langchain-community installed in your environment? e.g.,

cd /path/to/your/langchain/libs/community/
pip install -e .

Hi @ccurme , thanks will look into it and update.

Hi @ccurme , Have resolved this, let me know of the next steps.

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Jul 24, 2024
@ccurme ccurme enabled auto-merge (squash) July 24, 2024 13:44
@ccurme ccurme merged commit e271965 into langchain-ai:master Jul 24, 2024
45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Related to langchain-community 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features lgtm PR looks good. Use to confirm that a PR is ready for merging. Ɑ: retriever Related to retriever module size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants