PyGaggle: Neural Ranking Baselines on MS MARCO Passage Retrieval - Entire Dev Set

This page contains instructions for running various neural reranking baselines on the MS MARCO passage ranking task. We will run on the entire dev set. Note that there is also a separate MS MARCO document ranking task and a separate MS MARCO passage ranking task - Subset.

Prior to running this, we suggest looking at our first-stage BM25 ranking instructions. We rerank the BM25 run files that contain ~1000 passages per query using both monoBERT and monoT5. monoBERT and monoT5 are pointwise rerankers. This means that each document is scored independently using either BERT or T5 respectively.

Since it can take days to run these models on all of the 6980 queries from the MS MARCO dev set, we will use Compute Canada to replicate.

Registration and Virtual Environments

Please follow this guide to create an account on Compute Canada. After that, please follow that guide to create a virtual environment so that you can easily install Python packages. Note: Don't forget to update pip and setuptools.

When you are running experiments for the first time and need to debug here, please submit jobs interactively to ensure your code is bug-free. After that, write scripts to run the experiment without monitoring it, and you have to do this if you don't want to run experiments for days.

Installation

After you enter the compute node, let's install Pygaggle under the ~/scratch directory.

Note 1: Run the following instructions at root of this repo. Note 2: Make sure that you have access to a GPU. Note 3: Installation must have been done from source and make sure the anserini-eval submodule is pulled. To do this, first clone the repository recursively.

git clone --recursive https://github.com/castorini/pygaggle.git

Then load Java module:

module load java

Then install Pytorch.

pip install torch

Then install PyGaggle by the following command.

pip install -r requirements.txt

Note: On Compute Canada, you may have to install tensorflow separately by the following command.

pip install tensorflow_gpu

Models

monoBERT-Large: Passage Re-ranking with BERT (Nogueira et al., 2019)
monoT5-base: Document Ranking with a Pretrained Sequence-to-Sequence Model (Nogueira et al., 2020)

Data Prep

We're first going to download the queries, qrels and run files corresponding to the entire MS MARCO dev set considered. The run file is generated by following the BM25 ranking instructions. We'll store all these files in the data/msmarco_ans_entire directory.

You can download these three files from this repository.

queries.dev.small.tsv: 6,980 queries from the MS MARCO dev set.
qrels.dev.small.tsv: 7,437 pairs of query relevant passage ids from the MS MARCO dev set.
run.bm25.dev.small.tsv: Approximately 6,980,000 pairs of dev set queries and retrieved passages using BM25.

Note: Please rename run.bm25.dev.small.tsv to run.dev.small.tsv.

As a sanity check, we can evaluate the first-stage retrieved documents using the official MS MARCO evaluation script.

python tools/scripts/msmarco/msmarco_passage_eval.py data/msmarco_ans_entire/qrels.dev.small.tsv data/msmarco_ans_entire/run.dev.small.tsv

The output should be:

#####################
MRR @10: 0.18736452221767383
QueriesRanked: 6980
#####################

Let's download and extract the pre-built MS MARCO index into indexes:

wget https://git.uwaterloo.ca/jimmylin/anserini-indexes/raw/master/index-msmarco-passage-20191117-0ed488.tar.gz -P indexes
tar xvfz indexes/index-msmarco-passage-20191117-0ed488.tar.gz -C indexes

Now, we can begin with re-ranking the set.

Re-Ranking with monoBERT

First, lets evaluate using monoBERT!

python -um pygaggle.run.evaluate_passage_ranker --split dev \
                                                --method seq_class_transformer \
                                                --model castorini/monobert-large-msmarco \
                                                --dataset data/msmarco_ans_entire/ \
                                                --index-dir indexes/index-msmarco-passage-20191117-0ed488 \
                                                --task msmarco \
                                                --output-file runs/run.monobert.ans_entire.dev.tsv

Upon completion, the following output will be visible:

precision@1     0.2533
recall@3        0.45093
recall@50       0.80609
recall@1000     0.86289
mrr             0.38789
mrr@10          0.37922

It takes about ~57 hours to re-rank this entire dev set on MS MARCO using a V100. The type of GPU will directly influence your inference time. It is possible that the default batch results in a GPU OOM error. In this case, assigning a batch size (using option --batch-size) which is smaller than the default (96) should help!

The re-ranked run file run.monobert.ans_entire.dev.tsv will also be available in the runs directory upon completion.

We can use the official MS MARCO evaluation script to verify the MRR@10:

python tools/scripts/msmarco/msmarco_passage_eval.py data/msmarco_ans_entire/qrels.dev.small.tsv runs/run.monobert.ans_entire.dev.tsv

You should see the same result. Great, let's move on to monoT5!

Re-Ranking with monoT5

We use the monoT5-base variant as it is the easiest to run without access to larger GPUs/TPUs. Let us now re-rank the set:

python -um pygaggle.run.evaluate_passage_ranker --split dev \
                                                --method t5 \
                                                --model castorini/monot5-base-msmarco \
                                                --dataset data/msmarco_ans_entire \
                                                --model-type t5-base \
                                                --task msmarco \
                                                --index-dir indexes/index-msmarco-passage-20191117-0ed488 \
                                                --batch-size 32 \
                                                --output-file runs/run.monot5.ans_entire.dev.tsv

The following output will be visible after it has finished:

precision@1     0.25129
recall@3        0.45362
recall@50       0.80709
recall@1000     0.86289
mrr             0.38839
mrr@10          0.37986

It takes about ~26 hours to re-rank this entire dev set on MS MARCO using a V100. It is worth noting again that you might need to modify the batch size to best fit the GPU at hand.

Upon completion, the re-ranked run file run.monot5.ans_entire.dev.tsv will be available in the runs directory.

We can use the official MS MARCO evaluation script to verify the MRR@10:

python tools/scripts/msmarco/msmarco_passage_eval.py data/msmarco_ans_entire/qrels.dev.small.tsv runs/run.monot5.ans_entire.dev.tsv

You should see the same result.

If you were able to replicate these results, please submit a PR adding to the replication log! Please mention in your PR if you find any difference!

Replication Log

Results replicated by @qguo96 on 2020-10-08 (commit 3d4b7c0) (Tesla V100 on Compute Canada)
Results replicated by @stephaniewhoo on 2020-10-25 (commite815051) (Tesla V100 on Compute Canada)
Results replicated by @rayyang29 on 2020-11-16 (commitd840b0c)(Tesla V100 on Compute Canada)
Results replicated by @Dahlia-Chehata on 2021-01-10 (commit623285a) (Tesla V100 on Compute Canada)
Results replicated by @KaiSun314 on 2021-01-16 (commit1414e32) (Tesla V100 on Compute Canada)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experiments-msmarco-passage-entire.md

experiments-msmarco-passage-entire.md

PyGaggle: Neural Ranking Baselines on MS MARCO Passage Retrieval - Entire Dev Set

Registration and Virtual Environments

Installation

Models

Data Prep

Re-Ranking with monoBERT

Re-Ranking with monoT5

Replication Log

Files

experiments-msmarco-passage-entire.md

Latest commit

History

experiments-msmarco-passage-entire.md

File metadata and controls

PyGaggle: Neural Ranking Baselines on MS MARCO Passage Retrieval - Entire Dev Set

Registration and Virtual Environments

Installation

Models

Data Prep

Re-Ranking with monoBERT

Re-Ranking with monoT5

Replication Log