Incorporate Retrieval Scores into RM3 #453

mam10eks · 2024-08-12T14:41:15Z

This pull request (currently in draft) is aimed to address issue 407.

…ing terrier-org#407 will not change their behaviour

cmacdonald · 2024-08-13T08:30:53Z

pyterrier/rewrite.py

            occurrences = [0] * len(docids)

        elif "docno" in topics_and_res.columns:
            docnos = topics_and_res[topics_and_res["qid"] == qid]["docno"].values
            docids = []
            scores = []
+            docnos_to_scores = {i:0.0 for i in docnos}
+            if self.requires_scores:
+                docnos_to_scores = {i['docno']: i['score'] for _, i in topics_and_res[topics_and_res["qid"] == qid].iterrows()}


do we need to make a dictionary here? why not extract a list like was done above?

A bit of an aside, but the topics_and_res[topics_and_res["qid"] == qid] selector is also performed a few lines above, and it's pretty inefficient. Can we cache the result to avoid doing it twice?

Indeed, we could indeed cache the results.

And I also see now that we do not need a dict, I confused the skipped part, I was thinking that the if condition for skipped documents would have a continue statement, but now I see that this is not the case. I will modify the code.

I modified the code accordingly.

cmacdonald · 2024-08-13T08:31:51Z

Great, thanks @mam10eks for the high quality explanations. I have removed some unit test checks (Bo1 etc dont need the additional PRF jar), and added a single comment (do we need a dict?)

…ier-org#407

cmacdonald · 2024-08-13T11:28:37Z

I made some edits to ensure it didnt crash if some docno could not be resolved. @seanmacavaney are you happy?

I'll merge this forward to the java branch after its committed to master.

seanmacavaney · 2024-08-13T11:46:47Z

lgtm! I think we can remove the test jankiness once in the java branch.

mam10eks · 2024-08-13T11:58:01Z

Cool :)

Backport #453 to Java branch

mam10eks added 4 commits August 12, 2024 11:12

Start to prepare unit tests for RM3 to ensure the changes in ticket t…

c541814

…errier-org#407 behave as expected

Add unit tests that show the failure case for terrier-org#407

2dceb1d

Add unit tests for non-rm3 query expansion methods to ensure that fix…

c5e9d03

…ing terrier-org#407 will not change their behaviour

inject retrieval scores for bm25 terrier-org#407

2701e28

mam10eks mentioned this pull request Aug 12, 2024

RM3 does not add additional terms to the query for very small corpora #407

Closed

2 tasks

mam10eks added 3 commits August 13, 2024 09:08

Introduce requires_scores for QueryExpansion transformers like RM3 te…

15e3538

…rrier-org#407

Introduce requires_scores for QueryExpansion transformers like RM3 te…

e2d6f57

…rrier-org#407

Add RM3 unit tests into the github push workflow terrier-org#407

ab74f5b

mam10eks marked this pull request as ready for review August 13, 2024 07:53

mam10eks and others added 2 commits August 13, 2024 09:55

change default scores to fload as before terrier-org#407

7f26de1

Bo1/DFR/KL dont need terrier-prf

56aa519

cmacdonald reviewed Aug 13, 2024

View reviewed changes

mam10eks and others added 2 commits August 13, 2024 13:17

cache topics_and_res for qid and remove unneded score dictionary terr…

f711f2b

…ier-org#407

handle case where /some/ docnos cannot be resolved

9591f21

cmacdonald changed the title ~~Incorporate Retrieval Scores into RM3 (Pull request in Draft)~~ Incorporate Retrieval Scores into RM3 Aug 13, 2024

cmacdonald merged commit 4dd0752 into terrier-org:master Aug 13, 2024
14 checks passed

cmacdonald mentioned this pull request Aug 13, 2024

Backport #453 to Java branch #454

Merged

seanmacavaney added a commit that referenced this pull request Aug 16, 2024

Merge pull request #454 from terrier-org/java_backport_453

1c46751

Backport #453 to Java branch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorporate Retrieval Scores into RM3 #453

Incorporate Retrieval Scores into RM3 #453

mam10eks commented Aug 12, 2024

cmacdonald Aug 13, 2024

seanmacavaney Aug 13, 2024

mam10eks Aug 13, 2024

mam10eks Aug 13, 2024

cmacdonald commented Aug 13, 2024

cmacdonald commented Aug 13, 2024 •

edited

Loading

seanmacavaney commented Aug 13, 2024

mam10eks commented Aug 13, 2024

Incorporate Retrieval Scores into RM3 #453

Incorporate Retrieval Scores into RM3 #453

Conversation

mam10eks commented Aug 12, 2024

cmacdonald Aug 13, 2024

Choose a reason for hiding this comment

seanmacavaney Aug 13, 2024

Choose a reason for hiding this comment

mam10eks Aug 13, 2024

Choose a reason for hiding this comment

mam10eks Aug 13, 2024

Choose a reason for hiding this comment

cmacdonald commented Aug 13, 2024

cmacdonald commented Aug 13, 2024 • edited Loading

seanmacavaney commented Aug 13, 2024

mam10eks commented Aug 13, 2024

cmacdonald commented Aug 13, 2024 •

edited

Loading