FeatureSelectionBenchMarks

This is the code repository of the feature (gene) selection benchmark in both scRNA-seq and spatial transcriptomics.

Software features

After a simple configuration, you can run the benchmark (including data loading, quality control, feature selection, and cell clustering/domain detection) in one single line of code:

from benchmark.run_benchmark import run_bench


# configure the dataset information
data_cfg = {
    'your_data_name': {
        'adata_path': 'path/to/h5ad/file',
        'annot_key': 'annotation_name',
    }}
# configure feature selection methods and numbers of selected features
fs_cfg = {'feature_selection_method': [1000, 2000]}
# configure clustering methods and numbers of runs
cl_cfg = {'clustering_method': 2}
# run the benchmark in one line of code
run_bench(data_cfg, fs_cfg, cl_cfg, modality='scrna', metrics=['ARI', 'NMI'])

The evaluation results will be automatically saved as an XLSX file in the working directory with name like this:

2023-02 14_54_32 scrna.xlsx

Other software features are:

Automatically save the results of each step (preprocessed data, selected features, and cluster labels)
Reload the cached genes and cluster labels when you use the same data (specified by the data name)
Support custom feature selection and cell clustering/domain detection methods
Present detailed and pretty logging messages based on rich and loguru (see examples in tutorial)

Currently supported methods

scRNA-seq

Feature selection

Name	Language	Reference
GeneClust	Python	paper
vst	Python	paper
mvp	Python	paper
triku	Python	paper
GiniClust3	Python	paper
SC3	Python	paper
scran	R	paper
FEAST	R	paper
M3Drop	R	paper
scmap	R	paper
deviance	R	paper
FEAST	R	paper
sctransform	R	paper

Cell clustering

Name	Language	Reference
SC3s	Python	paper
Seurat	R	paper
SHARP	R	paper
TSCAN	R	paper
CIDR	R	paper

Spatial transcriptomics

Feature selection

Name	Language	Reference
SpatialDE	Python	paper
SPARK-X	R	paper
Giotto	R	paper

Domain detection

Name	Language	Reference
SpaGCN	Python	paper
stLearn	Python	paper
STAGATE	Python	paper

Requirements

R packages

This benchmark is written in Python and calls R functions through rpy2. If you want to use some methods implemented with R language, please install the corresponding R packages.

Python packages

anndata>=0.8.0
numpy>=1.21.6
setuptools>=59.5.0
anndata2ri>=1.1
sc3s>=0.1.1
scanpy>=1.9.1
loguru>=0.6.0
rpy2>=3.5.6
sklearn>=0.0.post2
scikit-learn>=1.2.0
SpaGCN>=1.2.5
torch>=1.13.1
stlearn>=0.4.11
pandas>=1.5.2
opencv-python>=4.6.0
scipy>=1.9.3
rich>=13.0.0
triku>=2.1.4
statsmodels>=0.13.5
SpatialDE>=1.1.3
STAGATE_pyG>=1.0.0

Installation

git clone https://github.com/ToryDeng/FeatureSelectionBenchmarks
cd FeatureSelectionBenchmarks/
python3 setup.py install --user

Tutorial

The tutorial about how to run the benchmarks: tutorials/run_benchmarks.ipynb
The tutorial about how to read the records: tutorials/read_records.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
benchmark		benchmark
tests		tests
tutorials		tutorials
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FeatureSelectionBenchMarks

Software features

Currently supported methods

scRNA-seq

Feature selection

Cell clustering

Spatial transcriptomics

Feature selection

Domain detection

Requirements

R packages

Python packages

Installation

Tutorial

About

Releases

Packages

Languages

License

ToryDeng/FeatureSelectionBenchmarks

Folders and files

Latest commit

History

Repository files navigation

FeatureSelectionBenchMarks

Software features

Currently supported methods

scRNA-seq

Feature selection

Cell clustering

Spatial transcriptomics

Feature selection

Domain detection

Requirements

R packages

Python packages

Installation

Tutorial

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages