MICO is a Python package that implements a conic optimization based feature selection method with mutual information (MI) measure [1]. The idea behind the approach is to measure the features’relevance and redundancy using MI, and then formulate a feature selection problem as a pure-binary quadratic optimization problem, which can be heuristically solved by an efficient randomization algorithm via semidefinite programming [2]. Optimization software Colin [6] is used for solving the underlying conic optimization problems.
This package
- implements three methods for feature selections:
- MICO : Conic Optimization approach
- MIFS : Forward Selection approach
- MIBS : Backward Selection approach
- supports three different MI measures:
- generates feature importance scores for all selected features.
- provides scikit-learn compatible APIs.
- Download Colin distribution from http://www.colinopt.org/downloads.php and unpack it into a chosen directory (<CLNHOME>). Then install Colin package:
cd <CLNHOME>/python
pip install -r requirements.txt
python setup.py install
- To install MICO package, use:
pip install -r requirements.txt
python setup.py install
or
pip install colin-mico
To install the development version, you may use:
pip install --upgrade git+https://github.com/jupiters1117/mico
This package provides scikit-learn compatible APIs:
fit(X, y)
transform(X)
fit_transform(X, y)
The following example illustrates the use of the package:
import pandas as pd
from sklearn.datasets import load_breast_cancer
# Prepare data.
data = load_breast_cancer()
y = data.target
X = pd.DataFrame(data.data, columns=data.feature_names)
# Perform feature selection.
mico = MutualInformationConicOptimization(verbose=1, categorical=True)
mico.fit(X, y)
# Populate selected features.
print("Selected features: {}".format(mico.get_support()))
# Populate feature importance scores.
print("Feature importance scores: {}".format(mico.feature_importances_))
# Call transform() on X.
X_transformed = mico.transform(X)
User guide, examples, and API are available here.
[1] | T Naghibi, S Hoffmann and B Pfister, "A semidefinite programming based search strategy for feature selection with mutual information measure", IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(8), pp. 1529--1541, 2015. [Pre-print] |
[2] | M Goemans and D Williamson, "Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming", J. ACM, 42(6), pp. 1115--1145, 1995 [Pre-print] |
[3] | H Yang and J Moody, "Data Visualization and Feature Selection: New Algorithms for Nongaussian Data", NIPS 1999. [Pre-print] |
[4] | M Bennasar, Y Hicks, abd R Setchi, "Feature selection using Joint Mutual Information Maximisation", Expert Systems with Applications, 42(22), pp. 8520--8532, 2015 [pre-print] |
[5] | H Peng, F Long, and C Ding, "Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy", IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), pp. 1226--1238, 2005. [Pre-print] |
[6] | Colin: Conic-form Linear Optimizer (www.colinopt.org). |
- KuoLing Huang, 2019-presents
MICO is 3-clause BSD licensed.
MICO is heavily inspired from MIFS: Parallelized Mutual Information based Feature Selection module by Daniel Homola.