Skip to content
Cheng Li edited this page Sep 29, 2017 · 21 revisions

Pair-wise Conditional Random Fields for multi-label classification

Pyramid implements the pair-wise Conditional Random Field algorithm for multi-label classification as described in the paper below.

Collective multi-label classification.

Nadia Ghamrawi and Andrew McCallum.

In proceedings of the 14th ACM international conference on Information and knowledge management. ACM, 2005.

Pair-wise CRF is a log-linear model and captures pair-wise label dependencies. The paper describes three model variants, and the one implemented here is called "CML" in the paper. This implementation uses L-BFGS for training and "supported inference" for inference.

Usage

To run the pair-wise CRF algorithm, please just type

./pyramid config/crf.properties

where the crf.properties file specifies all the algorithm parameters, as explained below.

Program Properties

The properties file (a plain text file with each line being a key value pair) specifies all input, output and hyper parameters required by the program. A sample properties file is shown below. The same file can also be found in the config folder associated with the code release. You can modify this file to set up the the correct dataset paths on your computer and experiment with different model parameters.

# path to the input training data
input.trainData=/scratch/wang.bin/trec.data/medical/data_sets/train

# path to the input test data
input.testData=/scratch/wang.bin/trec.data/medical/data_sets/test

# where to save the model and prediction
output.folder=/scratch/li.che/projects/pyramid/archives/crf/medical/1

# train the model on the training set
train=true
# test the model on the test set
test=true

# Gaussian prior variance for model weights; smaller value indicates strong regularization
# it may have a big impact on the test performance and usually requires some tuning
train.gaussianVariance=1

# show training performance after every k iterations
train.showProgress.interval=5

# generate reports for prediction
train.generateReports=true

# the training will stop when the objective value converges or it reaches the max number of iterations
train.maxIteration=500

# to achieve optimal prediction under which target measure 
# subsetAccuracy or instanceFMeasure
predict.target=subsetAccuracy

# only display labels with marginal probabilities above the threshold in reports; 
# this value does not affect prediction; it makes the display nicer
report.classProbThreshold=0.4

# the internal Java class name for this application. 
# users do not need to modify this.
pyramid.class=App6

Sample Datasets

Some commonly used multi-label classification datasets can be downloaded here.

Source Code

The source code files related to pair-wise CRF can be found here and here.

Clone this wiki locally