Skip to content

YASE is a search engine based on the MS Marco document collection, and composed by an indexer and a query processor

Notifications You must be signed in to change notification settings

federicominniti/YASE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YASE

YASE is a search engine capable of providing search results for the MS Marco document collection (dataset is available on GitHub). In particular YASE has:

  • an indexer that creates: an inverted index (with skipping blocks), a lexicon and a document index;
  • a query processor that allows the user to perform conjunctive and disjunctive free text queries. (It implements both TF-IDF and BM25 scoring functions).

Furthermore, YASE employs dynamic pruning techniques such as MaxScore in order to boost the performances of disjunctive queries. The search engine also adopts simple compression techniques (variable bytes and unary) to reduce the size of the inverted index, in order to be efficient in terms of memory usage while maintaining a fast research response.

NOTE Some files are avaliable here https://drive.google.com/drive/folders/13oxrPoda-8V5NOw3DzhLom-TOfcda7Uu?usp=drive_link:

  • You need to place results.txt of the drive in the 'trec_eval' folder of the project
  • You need to place the structures folder and collection.tar.gz of the drive in SearchEngine/src/main/resources/

Alternatively you can just copy the collection.tar.gz and reconstruct all the other files

Here1 there are the requirements of the project and here2 is available the complete documentation

(Repository for multimedia information retrieval and computer vision's project)

Main goals are:

  • Develop the code to create the inverted index with skipping blocks
  • Implement conjunctive and disjuctive queries
  • Perform the index compression

Preview:

preview

About

YASE is a search engine based on the MS Marco document collection, and composed by an indexer and a query processor

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages