gNOMO2

A comprehensive and modular pipeline for integrated multi-omics analyses of microbiomes

If you use this tool, please cite:

Arikan M, Muth T. (2024) gNOMO2: a comprehensive and modular pipeline for integrated multi-omics analyses of microbiomes. GigaScience, 13, giae038, https://doi.org/10.1093/gigascience/giae038

Overview

gNOMO2 comprises six modules, each tailored for specific omics data combination (shown below). Module 1 accepts 16S rRNA gene amplicon sequencing (AS) data as input and generates a protein database suitable for metaproteomics studies, a taxa abundance plot and a phyloseq object that can be used for downstream analysis in other microbiome tools. Modules 2 to 6 handle different combinations of AS, metagenomics (MG), metatranscriptomics (MT), and metaproteomics(MP) data, creating omics-specific protein databases, abundance tables, plots, differential abundance analysis results, joint visualization and pathway-level integration analysis results.

Requirements

To use gNOMO2, ensure you have conda and snakemake installed:

1. Install conda: If you do not have conda installed, install conda.

2. Create a Snakemake environment in conda:

conda create -n snakemake bioconda::snakemake=7.15.2 conda-forge::mamba

3. Clone gNOMO2 repository: If you do not have git installed, install git.

git clone --recursive https://github.com/muzafferarikan/gNOMO2.git

Note: Once conda and snakemake are set up, gNOMO2 manages the installation of all other tools and dependencies automatically in their respective environments during the first run.

Setup

Data

Copy your raw data to the relevant subfolders within the data directory:

If you have amplicon sequencing data, copy your files to data/AS/raw
If you have metagenomics data, copy your files to data/MG/raw
If you have metatranscriptomics data, copy your files to data/MT/raw
If you have metaproteomics data, copy your files to data/MP/spectra.

Important: Please check sample format requirements below:

Data	Library Layout	Sample Name Format
AS	PE SE	samplename_1.fastq.gz, samplename_2.fastq.gz samplename_1.fastq.gz
MG	PE	samplename_1.fastq.gz, samplename_2.fastq.gz
MT	PE SE	samplename_1.fastq.gz, samplename_2.fastq.gz samplename_1.fastq.gz
MP	DDA	samplename.mgf

Metadata

gNOMO2 requires a metadata file to perform sample group comparisons. Create a tab delimited metadata file (name it metadata.txt) containig information about samples and copy it to the resources folder.

Important:

The name of the first column in metadata file must be "SampleID"

Config

After copying your data and metadata, run the following script from your main gNOMO2 project folder to generate a config file:

bash workflow/scripts/prepare_config.sh

This script generates a config.yaml file within config folder based on contents of data directory. Review and modify analysis parameters in this file if you need.

Running

Running locally

Once setup is complete, follow these steps to run gNOMO2:

1. Activate your snakemake environment in conda:

conda activate snakemake

2. Run gNOMO2:
Execute the following command from your project folder:

snakemake -s workflow/Snakefile --cores 2 --use-conda

Note: Adjust the --cores value to reflect the number of cores available.

Running on a cluster

To run gNOMO2 on a cluster:

1. Configure the cluster settings:
Edit the provided gnomo2_slurm_template.sh file in the main gNOMO2 folder according to your cluster settings.

2. Run gNOMO2:
Execute the following command from your home directory in the cluster environemnt:

sbatch path/to/gNOMO2/gnomo_slurm_template.sh

Outputs

When gNOMO2 pipeline starts, it generates a results folder within your project directory, containing both final and intermediate outputs.

Final outputs

The final folder includes:

Integrated multi-omics analysis results (integrated)
- Differential abundance analysis results for each omics dataset (diff_abun)
- Joint-visualization results (combi)
- Pathway level integration results (pathview)
- A proteogenomic database (prot_db)
Results for each omics dataset within folders named accordingly (AS,MG,MT,MP).
- Abundance tables
- Taxonomy tables
- Phyloseq objects
- Abundance plots
These files are suitable for further analyses using other microbiome analysis tools.

Intermediate outputs

intermediate folder contains outputs of each step executed by the gNOMO2 pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
data		data
images		images
resources		resources
workflow		workflow
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gnomo2_slurm_template.sh		gnomo2_slurm_template.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gNOMO2

A comprehensive and modular pipeline for integrated multi-omics analyses of microbiomes

Table of contents

Overview

Requirements

Setup

Data

Metadata

Config

Running

Running locally

Running on a cluster

Outputs

Final outputs

Intermediate outputs

About

Releases

Packages

Languages

License

muzafferarikan/gNOMO2

Folders and files

Latest commit

History

Repository files navigation

gNOMO2

A comprehensive and modular pipeline for integrated multi-omics analyses of microbiomes

Table of contents

Overview

Requirements

Setup

Data

Metadata

Config

Running

Running locally

Running on a cluster

Outputs

Final outputs

Intermediate outputs

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages