Skip to content
Sergio Ramírez edited this page May 13, 2015 · 4 revisions

Data Reader

fast-mRMR uses a specific format (binary and columnar) to ease the subsequent process. To transform datasets in CSV-format to this new format, we have created a Data Reader program. This reads a CSV file (with header) and transforms it in a binary file called "data.mrmr" (a example of CSV and mRMR are included in utils folder). This format is only needed by CPU and GPU versions. Spark's version can read whatever dataset Spark can read.

Compilation

In order to compile the reader, we include a Makefile example in the same folder that generates a binary file called mrmr-reader (we also include a example of binary file).

cd fast-mRMR/utils/data-reader && make && chmod +x mrmr-reader

Example

The usage is as follows:

$ ./mrmr-reader
Usage: <inputfilename>

Passing as argument the input file, the program outputs the following:

$ ./mrmr-reader data.csv 
...0 / 48
Readed Samples: 48
Last 15 samples ignored.
Clone this wiki locally