Skip to content

3. Sparsity: Logistic Lasso

QiHong Lu edited this page Nov 1, 2016 · 12 revisions

Compare lasso vs. ridge penalty in the context of logistic regression
[simulation code]

In the 1st simulation, I compared L1 and L2 penalty in the context of regression and least square model. Here, I am switching to classification with logistic models. However, the story is the same: if we know the true underlying signal is sparse, forcing sparsity with the L1 penalty tend to yield a more interpretable model.

There are 512 features and 256 training examples so the system is underdetermined. Across the following 4 experiments, I varied the number of non-zero elements of the underlying feature values. The sparsity parameter, lambda, for the regularized logistic model is tuned among 100 values with a nested 10-folds cross validation (done by CVGLMNET with the default objective).

Note that when there are many signal-carrying features, enforcing sparsity give us a low hit rate. Namely, there are many non-zero features were assigned zero weight, under the Lasso penalty. For example, in the last simulation, all 512 features are actually signal carrying, but Logistic Lasso identified about 100 of them. One reason is that sparsity is achieve by ignoring correlated features.

This is a problem. For example, in one of our previous fMRI studies, the goal is to find ALL signal carrying voxels that support face-decoding. I will talk about one way of discovering more informative features, without introducing any new technique, in the following section.

Clone this wiki locally