-
Notifications
You must be signed in to change notification settings - Fork 25
Spark version
Here, we shows all the information about the distributed mRMR version for Apache Spark.
- Spark 1.3.x must be installed (it also runs on 1.2 version, tuning the correspondent parameter in the pom.xml file).
- Maven must be installed if we want to compile the package. Anyway, the JAR file is provided.
For compilation purpose, we include a POM file that compiles the code and creates a JAR file that can be executed or attached in a spark-shell/spark-submit instance.
cd fast-mRMR/spark && mvn package
This generates a building jar in: fast-mRMR/spark/target/spark-fast-mrmr-0.1.jar
To execute the selector method, we have to type the following command:
$ spark-submit --master local[*] --driver-memory 4g --class examples.FStest ~/git/fast-mRMR/spark/target/spark-fast-mrmr-0.1.jar
It outputs a ranking of features separated by rows (in a descending order), and transform the instances in a2a.libsvm file to the new feature-space. To illustrate the process, it also shows the first instance in the dataset projected into this new space.
*** mRMR features ***
Feature Score
73 0,8466
40 0,4479
83 0,1962
63 0,3151
67 0,2435
6 0,2881
80 0,2492
39 0,2128
22 0,1550
82 0,1391
5 0,1153
4 0,1042
76 0,0878
3 0,0999
17 0,0858
15 0,0791
14 0,0744
16 0,0704
18 0,0675
19 0,0619
20 0,0553
51 0,0527
52 0,0458
36 0,0437
75 0,0432
(0.0,(25,[0,4,9,13,18,19,20,21,22,24],[1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0]))
The JAR app can be also attached to a Spark's instance with the following command:
$ spark-shell --master local[*] **--jars** fast-mRMR/spark/target/spark-fast-mrmr-0.1.jar
Now, it can be used by importing the main class: MrmrSelector:
scala> import org.apache.spark.mllib.feature.MrmrSelector
import org.apache.spark.mllib.feature.MrmrSelector
A complete example can be found in examples.FStest, which is listed below:
package examples
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.util.MLUtils
import org.apache.spark.mllib.linalg.Vectors
import java.util.ArrayList
import org.apache.spark.mllib.feature._
object FStest {
def main(args: Array[String]): Unit = {
val initStartTime = System.nanoTime()
val conf = new SparkConf().setAppName("FS test")
val sc = new SparkContext(conf)
val data = MLUtils.loadLibSVMFile(sc, "a2a.libsvm")
val selector = MrmrSelector.train(data)
val redData = data.map { lp =>
LabeledPoint(lp.label, selector.transform(lp.features))
}
println(redData.first().toString())
}
}