Skip to content

GARBO, a novel multi-island adaptive Genetic AlgoRithm for biomarker selection in high-dimensional Omics data.

Notifications You must be signed in to change notification settings

Greco-Lab/GARBO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

GARBO

In clinical applications, feature selection is a crucial step for the selection of low dimensional biomarkers from high dimensional data generated by Omics technologies. Here we introduce GARBO, a novel multi-island adaptive genetic algorithm to simultaneously optimize accuracy and set size in omics-driven biomarker discovery problems.

The GARBO algorithm is currentely implemented in python.

Installation

Install phython 2.7 and install the following python modules within your virtual environment sklearn, multiprocessing, matplotlib, operator, bisect, deap, numpy, random, skfuzzy, cPickle, pandas, scipy, sys, getopt source activate 'your_virtual_env'

Examples

To run GARBO-islands in parallel on a computer desktop

export NGEN=100 # Indicate the number of GA-iterationsexport 
NPOP=50 # Indicate the number of the chromosomes to be generated for each nicheexport 
MINL=30 # Indicate the initial minimum length of the generated chromosomesexport 
MAXL=50 # Indicate the initial maixmum length of the generated chromosomesexport 
NN=2 # Specify the number of nichesexport 
RN=1 # Indicate if an initial rank of the features must be compiled (RN=1) otherwise it starts with no ranking information (RN=0).
export INPUT_FILE="data_ccle_erl_ge.csv" # Omics dataset (samples as columns and rows as features< The last feature must named 'class' and it correpsonds to the target label)
export OUTPUT_DIR="MRNA_run_1"    # Folder that will contain a file for each nicheserialized python-obejcts. Each file contains the
mkdir $OUTPUT_DIR
nohup python runGARBO.py -g $NGEN -p $NPOP -s $MINL -l $MAXL -n $NN -r $RN -i $INPUT_FILE -o $OUTPUT_DIR > output_mrna.log &

To run multiple times GARBO for cross-validations (please note that the sinlge islands will run in parallel).

First, we create a bash script to launch one GARBO-task (e.g. 'garbo_one_task.sh').

#!/bin/bash
#SBATCH --mail-type=END
#SBATCH --mail-user=##################
source activate garbo
python runGARBO.py -g $NGEN -p $NPOP -s $MINL -l $MAXL -n $NN -r $RN -i $INPUT_FILE -o $OUTPUT_DIR
conda deactivate

Then, we create a second bash script indicting the setting for a run in parallel. For instance, with the following setting, we run GARBO 10 times (e.g. 10-fold cross validation) and each run utilizes 10 islands for a total of 100 jobs.

#!/bin/bash
#SBATCH --partition parallel
#SBATCH --time 2-00:00:00       
#SBATCH --mem-per-cpu=5000
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=10
#SBATCH --array=1-10
export NGEN=500
export NPOP=500
export MINL=30
export MAXL=50
export NN=10
export RN=1
export INPUT_FILE="train_ge_$SLURM_ARRAY_TASK_ID.csv"
export OUTPUT_DIR="GE_view_$SLURM_ARRAY_TASK_ID"
mkdir $OUTPUT_DIR
srun -o ccle_erl_ge$SLURM_ARRAY_TASK_ID.out -e ccle_ge_$SLURM_ARRAY_TASK_ID.err garbo_one_task.sh

To read the output of GARBO, the user needs to load a Python object structure from a pickle file created by GARBO (.pkl). This Python object structure consists of four data structures:

  • the last population of chormosomes (biomarker sets);
  • a list of populations of chromosomes that are saved during the GA-iterations.
  • a logbook indictating evalaution metrics of the GA-iteration;
  • the weights to build the final feature ranking.
def readDataResult(pathname, nn = 10):
    ga_out_all = []
    list_all_chr = []
    for i in range(nn):
        f = open(pathname + str(i) + '.pkl', 'rb')
        ga_out = []
        for i in range(4):
            ga_out.append(pickle.load(f))
        ga_out_all.append(ga_out[:4])
        list_all_chr = list_all_chr + ga_out[1]
        f.close()
    return ga_out_all

Contact Information

Vittorio Fortino vittorio.fortino@uef.fi Dario Greco dario.greco@utune.fi

About

GARBO, a novel multi-island adaptive Genetic AlgoRithm for biomarker selection in high-dimensional Omics data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages