Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds run fusion to anserini #2489

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

DanielKohn1208
Copy link
Contributor

This adds run fusion to anserini to match pyserini's implementation and closes #2355. To fuse 2 runs we use the following command

/bin/run.sh io.anserini.fusion.FuseRuns 
     -method methodname
     -runs runa runb
     -output outputfile

Currently the supported methods are rrf, interpolation, and average

* See the License for the specific language governing permissions and
* limitations under the License.
*/
package io.anserini.fusion;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra blank line

this.runtag = runtag;
}

private class Document implements Comparable<Document>{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private class Document implements Comparable<Document>{
private class Document implements Comparable<Document> {

String docid;
double score;

public Document(String docid, double score)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

House style, brace on previous line

this.score = score;
}

@Override public int compareTo(Document a)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

House style, annotation on separate line.


}

public void writeTopic(String qid, HashMap<String,Double> results) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void writeTopic(String qid, HashMap<String,Double> results) {
public void writeTopic(String qid, Map<String,Double> results) {

No need to specify impl


public void writeTopic(String qid, HashMap<String,Double> results) {
int rank = 1;
ArrayList<Document> documents = new ArrayList<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ArrayList<Document> documents = new ArrayList<>();
List<Document> documents = new ArrayList<>();

ditto

}
}

public class FuseRuns {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Python, it doesn't violate best practice to have multiple classes in a single file.
In Java it's different... best practice is one class per file. If you need extra classes, organize as inner classes.

So above class should be it's own file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And FusedRunOutputWriter also.


}

public static TreeMap<String, HashMap<String,DocScore>> createRunMap(String filename) throws FileNotFoundException, IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, think about when you really need to specify impl?


TreeMap<String, HashMap<String, Double>> finalRun;

if (fuseArgs.method.equals(FusionMethods.AVERAGE)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm... see high-level comments.

Copy link
Member

@lintool lintool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm... I'm not sure I like the design.

Should there be like an actual RunsFuser class (I don't like the name but we can quibble about it separately) that performs the actual fusion. Right now, you have FusionMethods that's dispatched from a main class.

So, I'm thinking - move the logic of fusion into RunsFuser?

I like the design pattern of a lightweight main class that really just parses args, and dispatches to class that handles the heavy lifting.

E.g., HnswDenseSearcher is the main searcher class https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/search/HnswDenseSearcher.java

And SearchHnswDenseVectors.java does arg parsing and dispatches:
https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/search/SearchHnswDenseVectors.java

Thoughts? Open to discussion...

(Updated, fixed links)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants