Sergio Ramírez edited this page May 13, 2015 · 3 revisions

Spark version

Here, we shows all the information about the distributed mRMR version for Apache Spark.


  • Spark 1.3.x must be installed (it also runs on 1.2 version, tuning the correspondent parameter in the pom.xml file).
  • Maven must be installed if we want to compile the package. Anyway, the JAR file is provided.


For compilation purpose, we include a POM file that compiles the code and creates a JAR file that can be executed or attached in a spark-shell/spark-submit instance.

cd fast-mRMR/spark && mvn package

This generates a building jar in: fast-mRMR/spark/target/spark-fast-mrmr-0.1.jar


To execute the selector method, we have to type the following command:

$ spark-submit --master local[*] --driver-memory 4g --class examples.FStest ~/git/fast-mRMR/spark/target/spark-fast-mrmr-0.1.jar

It outputs a ranking of features separated by rows (in a descending order), and transform the instances in a2a.libsvm file to the new feature-space. To illustrate the process, it also shows the first instance in the dataset projected into this new space.

*** mRMR features ***
Feature Score
73      0,8466
40      0,4479
83      0,1962
63      0,3151
67      0,2435
6       0,2881
80      0,2492
39      0,2128
22      0,1550
82      0,1391
5       0,1153
4       0,1042
76      0,0878
3       0,0999
17      0,0858
15      0,0791
14      0,0744
16      0,0704
18      0,0675
19      0,0619
20      0,0553
51      0,0527
52      0,0458
36      0,0437
75      0,0432


The JAR app can be also attached to a Spark's instance with the following command:

$ spark-shell --master local[*] **--jars** fast-mRMR/spark/target/spark-fast-mrmr-0.1.jar

Now, it can be used by importing the main class: MrmrSelector:

scala> import org.apache.spark.mllib.feature.MrmrSelector
import org.apache.spark.mllib.feature.MrmrSelector

A complete example can be found in examples.FStest, which is listed below:

package examples
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.util.MLUtils
import org.apache.spark.mllib.linalg.Vectors
import java.util.ArrayList
import org.apache.spark.mllib.feature._

object FStest {
  def main(args: Array[String]): Unit = {
  	val initStartTime = System.nanoTime()
	val conf = new SparkConf().setAppName("FS test")
	val sc = new SparkContext(conf)
	val data = MLUtils.loadLibSVMFile(sc, "a2a.libsvm")    
    	val selector = MrmrSelector.train(data)    
    	val redData = { lp => 
      		LabeledPoint(lp.label, selector.transform(lp.features)) 
