-
Notifications
You must be signed in to change notification settings - Fork 56
BR rerank
This page provides the code and data associated with the paper
Learning to Calibrate and Rerank Multi-label Predictions.
Cheng Li, Virgil Pavlu, Javed Aslam, Bingyu Wang, and Kechen Qin.
In ECML-PKDD, 2019.
The BR-Rerank method proposed in the paper is a two stage multi-label classification algorithm:
- Use BR to estimate the probability of each label independently and generate top-K set prediction candidates with highest scores
- Extract features from set candidates which capture label dependencies and apply a second calibrator model to rescore and rerank set candidates
Compared to standard BR, BR-rerank provides
- Higher classification accuracy
- Better calibrated prediction confidence scores
The picture above illustrates how BR-rerank makes predictions on the input test image. The "marginal" column shows the individual label probabilities estimated by BR. Note that the label "baseball glove" has a probability below the 0.5 threshold, and therefore will not be included in BR's predictions. The "set prediction candidates" column shows the top-5 set prediction candidates with the highest BR scores generated by dynamic programming based on BR marginals. The "set prediction features" column shows, for each set candidate, its BR score, its binary encoding, its cardinality and its prior probability. The "reranker score" column shows the calibrated BR-rerank confidence score for each set prediction candidate. For this image, BR predicts the incorrect set {"person", "baseball bat"} with confidence 0.58. BR-rerank predicts the correct set {"person", "baseball bat", "baseball glove"} with confidence 0.17.
Pre-compiled code can be downloaded from the pyramid package release page. Datasets and properties files can be downloaded here. After downloading, please unzip all the files.
To reproduce the calibration result on RCV1 dataset (reported in Table 2 in the paper), first edit the first two lines of the calibration_exps/rcv1.properties file. Set the dataPath and outputDir to the proper (absolute) paths on your local computer. Then run pyramid with this properties file:
./pyramid-0.12.9/pyramid calibration_exps/rcv1.properties
(Note that you may need to change the version number if you are using a different version of pyramid.)
It will start training BR and GB calibrator and then report calibration performance.
You can do the same for other datasets as well using their properties files. The hyper parameters in each properties file have been set to be the ones tuned on validation set.
To reproduce the reranking classification result on RCV1 dataset (reported in Table 4 in the paper), first edit the first two lines of the rerank_exps/rcv1.properties file. Set the dataPath and outputDir to the proper (absolute) paths on your local computer. Then run pyramid with this properties file:
./pyramid-0.12.9/pyramid rerank_exps/rcv1.properties
It will start training BR and GB calibrator and then report reranking performance.
You can do the same for other datasets as well.
To increase memory allocation so that the code can run on larger datasets:
Open pyramid-0.12.9/pyramid with a text editor and change -Xmx10g to -Xmx100g if you want to allocate 100g memory, for instance.
The source code associated with BR-Rerank can be found here. Note that BR is implemented as a CBM with 1 component.
Feel free to contact me (chengli.email@gmail.com) if you have any questions about the paper or the code.