Skip to content
Sergio Ramírez edited this page May 13, 2015 · 3 revisions

Spark version

Here, we shows all the information about the distributed mRMR version for Apache Spark.

Prerequisites

  • Spark 1.3.x must be installed (it also runs on 1.2 version, tuning the correspondent parameter in the pom.xml file).
  • Maven must be installed if we want to compile the package. Anyway, the JAR file is provided.

Compilation

For compilation purpose, we include a POM file that compiles the code and creates a JAR file that can be executed or attached in a spark-shell/spark-submit instance.

cd fast-mRMR/spark && mvn package

This generates a building jar in: fast-mRMR/spark/target/spark-fast-mrmr-0.1.jar

Example

To execute the selector method, we have to type the following command:

$ spark-submit --master local[*] --driver-memory 4g --class examples.FStest ~/git/fast-mRMR/spark/target/spark-fast-mrmr-0.1.jar

It outputs a ranking of features separated by rows (in a descending order), and transform the instances in a2a.libsvm file to the new feature-space. To illustrate the process, it also shows the first instance in the dataset projected into this new space.

*** mRMR features ***
Feature Score
73      0,8466
40      0,4479
83      0,1962
63      0,3151
67      0,2435
6       0,2881
80      0,2492
39      0,2128
22      0,1550
82      0,1391
5       0,1153
4       0,1042
76      0,0878
3       0,0999
17      0,0858
15      0,0791
14      0,0744
16      0,0704
18      0,0675
19      0,0619
20      0,0553
51      0,0527
52      0,0458
36      0,0437
75      0,0432

(0.0,(25,[0,4,9,13,18,19,20,21,22,24],[1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0]))

The JAR app can be also attached to a Spark's instance with the following command:

$ spark-shell --master local[*] **--jars** fast-mRMR/spark/target/spark-fast-mrmr-0.1.jar

Now, it can be used by importing the main class: MrmrSelector:

scala> import org.apache.spark.mllib.feature.MrmrSelector
import org.apache.spark.mllib.feature.MrmrSelector

A complete example can be found in examples.FStest, which is listed below:

package examples
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.util.MLUtils
import org.apache.spark.mllib.linalg.Vectors
import java.util.ArrayList
import org.apache.spark.mllib.feature._

object FStest {
  def main(args: Array[String]): Unit = {
  	val initStartTime = System.nanoTime()
	val conf = new SparkConf().setAppName("FS test")
	val sc = new SparkContext(conf)
	val data = MLUtils.loadLibSVMFile(sc, "a2a.libsvm")    
    	val selector = MrmrSelector.train(data)    
    	val redData = data.map { lp => 
      		LabeledPoint(lp.label, selector.transform(lp.features)) 
    	} 
    	println(redData.first().toString())
  }
}
Clone this wiki locally