Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add run fusion to anserini #2355

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

cadurosar
Copy link
Collaborator

Basic script allowing for fusioning runs in Anserini. Receives two runs, normalizes them with min-max sum (following ranx https://amenra.github.io/ranx/normalization/#min-max-norm) and pushes it back to a file. Probably needs some added tests and better names for functions and variables as my java has been contaminated by python.

Tested by using the pyserini 2cr to generate trec-covid runs for contriever, bm25-flat and contriever-msmarco, then generates fusioned runs:

Fuse BM25 and Contriever

./target/appassembler/bin/FuseRuns -filename_a ../run.beir.bm25-flat.trec-covid.txt -filename_b ../run.beir.contriever.trec-covid.txt -filename_output test_fusion.txt

Fuse BM25 and Contriever-Msmarco

./target/appassembler/bin/FuseRuns -filename_a ../run.beir.bm25-flat.trec-covid.txt -filename_b ../run.beir.contriever-msmarco.trec-covid.txt -filename_output test_fusion_2.txt

Validated by running comparisons on ranx

from ranx import fuse, compare, Run, Qrels

qrel = Qrels.from_ir_datasets("beir/trec-covid")
run_bm25 = Run.from_file("../run.beir.bm25-flat.trec-covid.txt")
run_bm25.name = "bm25"
run_contriever = Run.from_file("../run.beir.contriever.trec-covid.txt")
run_contriever.name = "contriever"
run_contriever_ms = Run.from_file("../run.beir.contriever-msmarco.trec-covid.txt")
run_contriever_ms.name = "contriever-ms"

fused_a = fuse([run_bm25,run_contriever], method="sum")
fused_a.name = "fuse_a"
fused_b = fuse([run_bm25,run_contriever_ms], method="sum")
fused_b.name = "fuse_b"

fused_a_anserini = Run.from_file("test_fusion.txt")
fused_a_anserini.name="fuse_a_anserini"
fused_b_anserini = Run.from_file("test_fusion_2.txt")
fused_b_anserini.name="fuse_b_anserini"

print(compare(qrels=qrel,runs=[run_bm25,run_contriever,run_contriever_ms,fused_a,fused_b,fused_a_anserini,fused_b_anserini],metrics=["ndcg@10","recall@1000"]))

For which the result is:

#    Model            NDCG@10     Recall@10    Recall@1000
---  ---------------  ----------  -----------  -------------
a    bm25             0.595ᵇ      0.016ᵇ       0.396ᵇᶜᵈᶠ
b    contriever       0.273       0.006        0.168
c    contriever-ms    0.596ᵇ      0.016ᵇ       0.335ᵇ
d    fuse_a           0.579ᵇ      0.014ᵇ       0.356ᵇ
e    fuse_b           0.709ᵃᵇᶜᵈᶠ  0.019ᵃᵇᶜᵈᶠ   0.427ᵃᵇᶜᵈᶠ
f    fuse_a_anserini  0.580ᵇ      0.014ᵇ       0.356ᵇ
g    fuse_b_anserini  0.709ᵃᵇᶜᵈᶠ  0.019ᵃᵇᶜᵈᶠ   0.427ᵃᵇᶜᵈᶠ

There is a small difference in the 3rd decimal point of ndcg@10 (0.001) that may come from some trec-covid being too small and subject to some precision differences between python and java...

Copy link

codecov bot commented Jan 26, 2024

Codecov Report

Attention: 122 lines in your changes are missing coverage. Please review.

Comparison is base (f2e2ac3) 65.63% compared to head (fe7e529) 64.95%.

Files Patch % Lines
src/main/java/io/anserini/search/FuseRuns.java 0.00% 122 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2355      +/-   ##
============================================
- Coverage     65.63%   64.95%   -0.69%     
  Complexity     1397     1397              
============================================
  Files           207      208       +1     
  Lines         11612    11734     +122     
  Branches       1470     1486      +16     
============================================
  Hits           7622     7622              
- Misses         3481     3603     +122     
  Partials        509      509              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

1 participant