Sparse-Dense_Retrieval

Retrieve the top-𝑘 documents with respect to a given query by maximal inner product over dense and sparse vectors. This problem is solved by breaking the maximal inner product int two smaller MIPS problem:

Retrieve the top-𝑘' documents from a sparse retrieval system defined over the sparse portion of the vectors
Retrieve the top-𝑘' documents from a dense retrieval system defined over the dense portion of the vectors

Before merging the two sets and retrieving the top-𝑘 documents from the combined (much smaller) set. As 𝑘' approaches infinity, we see the final top-𝑘 ecoming exact, with the drawback that the retrieval becomes much slower.

The dataset that we decide to use are: nfcorpus and scifact

Application Workflow

Download the wanted dataset using Beir
Pre-processing the queries and documents text
Retrieve the sparse embedding using the ElasticSearch implementation of BM25 or the implemented version
Retrieve the dense embedding using SentenceBert
Obtaining the ground truth score and document rank at k for each query
Obtaining the merged embedding using the dense and sparse representation at k'
Retrieve the results over the ground truth at k and the merged version at k

Results

scifact dataset results
nfcorpus dataset results

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
results		results
Project_Description.pdf		Project_Description.pdf
README.md		README.md
Report_875532_Second_Assignment_LWMD.pdf		Report_875532_Second_Assignment_LWMD.pdf
Sparse_Dense_Retrieval.ipynb		Sparse_Dense_Retrieval.ipynb
preprocessing.py		preprocessing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sparse-Dense_Retrieval

Application Workflow

Results

About

Releases

Packages

Languages

zuliani99/Sparse-Dense_Retrieval

Folders and files

Latest commit

History

Repository files navigation

Sparse-Dense_Retrieval

Application Workflow

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages