Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add convenience method to get raw text from dense retrieval for prebuilt indexes #1856

Open
lintool opened this issue Apr 7, 2024 · 1 comment

Comments

@lintool
Copy link
Member

lintool commented Apr 7, 2024

This issue has come up more than once, the most recent being #1548

Our dense indexes don't store the raw text, but if it's a prebuilt index, we know the corresponding sparse index that has the text. It should be possible to implement a raw method that loads the corresponding sparse index to fetch the document.

@Yuv-sue1005
Copy link
Contributor

Yuv-sue1005 commented Jul 29, 2024

This issue is solved for faiss indexes through using the following code:

from pyserini.search.faiss import FaissSearcher

searcher = FaissSearcher.from_prebuilt_index('insert_faiss_index', 'insert_encoder')
doc = searcher.doc('insert_doc_id').raw()

Further testing should be done for other types of indexes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants