Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HybridRetriever raise KeyError: -1 if the len of doc less than 1_000 #29

Open
tshu-w opened this issue Oct 17, 2023 · 1 comment
Open
Labels
bug Something isn't working

Comments

@tshu-w
Copy link

tshu-w commented Oct 17, 2023

The cutoff of msearch for HybridRetriever is hardcode to 1_000, which makes map_internal_ids_to_original_ids raise KeyError when doc len less than 1_000

sparse_results = self.sparse_retriever.search(query, False, 1_000)
dense_results = self.dense_retriever.search(query, False, 1_000)

Thus, map_internal_ids_to_original_ids should be:

def map_internal_ids_to_original_ids(self, doc_ids: Iterable) -> List[str]:
    return [self.id_mapping[doc_id] for doc_id in doc_ids if doc_id != -1]
@AmenRa AmenRa added the bug Something isn't working label Oct 18, 2023
@AmenRa
Copy link
Owner

AmenRa commented Oct 18, 2023

Thanks for reporting the bug!
I'll fix it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants