GitHub - pblumenkamp/dna_coverage_analysis: Snakemake pipelines for preprocessing, mapping, and coverage charts of bacterial DNA-Seq data

DNA Coverage Analysis

Snakemake pipelines for preprocessing, mapping, and coverage charts of bacterial DNA-Seq data
Explore the docs »

Table of Contents

About The Project
Getting Started
- Prerequisites
- Installation
Usage

About The Project

These pipelines visualize the coverage of DNA-Seq data on one or multiple reference genomes. A pipeline consists of the following steps:

Quality control of the raw data with FastQC
Preprocessing with fastp
Quality control of the preprocessed data with FastQC
rRNA filtering with SortMeRNA
For each reference:
1. Mapping with bowtie2
2. Feature counting with featureCounts
3. Coverage plots with bedtools and R-Sushi

Getting Started

Prerequisites

The only requirements are a functional conda/mamba and Snakemake with version 8 or newer.

mamba

snakemake

mamba create -c conda-forge -c bioconda -n snakemake snakemake

Installation

git clone https://github.com/pblumenkamp/dna_coverage_analysis.git

Required files

DNA-Seq data in gzipped FASTQ format
One or multiple reference genomes in (gzipped or uncompressed) FASTA format
Reference Annotation for each genome in uncompressed GFF3 format

Usage

Use the pipeline in paired_end for paired-end data and the pipeline in single_end for single-end data.
```
# e.g.
cd paired_end
```
Change settings in config.yaml. The most important settings are the input directory and the used references.

Start the snakemake pipeline locally or on a compute cluster.

# Local
snakemake --configfile config.yaml --use-conda --resources mem_mb=<max_ram_usage_in_mb>
# Compute cluster 
snakemake --configfile config.yaml --use-conda --profile <path_to_your_cluster_profile>/cluster_profile

Config.yaml

There are, at the moment, 4 different parts in the config.yaml.

fastq_input_dir

This defines the directory where the DNA-Seq data is stored. As a naming convention, all single-end DNA-Seq files must end with fastq.gz, and all paired-end files must end with _R1.fastq.gz and _R2.fastq.gz.

coverage_resolution

Defines the resolution in base pairs (bp) for each bar in the final coverage bar plots. A list with multiple resolutions is possible (comma-separated), so separate folders for each coverage plot are created.

references

A list of all reference genomes for the coverage analysis. Each reference will be analyzed separately. genome must be the path to the reference genome in (compressed) FASTA format. annotation is the path to the reference annotation in uncompressed GFF3 format. gff_features is a list of GFF feature types which will be counted in separate count tables. Please verify that the listed feature type can also be found in the GFF3 file.

memory_usage_in_mb

List of pipeline steps with data-dependent memory usage. Please adjust these numbers if you use Snakemake on a compute cluster with memory limits and run in out-of-memory errors. These settings can also be used locally with the option --resources mem_mb=<max_ram_usage_in_mb>.

License

Distributed under the MIT License. See LICENSE.txt for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
paired_end		paired_end
single_end		single_end
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DNA Coverage Analysis

About The Project

Getting Started

Prerequisites

Installation

Required files

Usage

Config.yaml

fastq_input_dir

coverage_resolution

references

memory_usage_in_mb

License

About

Releases

Packages

Languages

License

pblumenkamp/dna_coverage_analysis

Folders and files

Latest commit

History

Repository files navigation

DNA Coverage Analysis

About The Project

Getting Started

Prerequisites

Installation

Required files

Usage

Config.yaml

fastq_input_dir

coverage_resolution

references

memory_usage_in_mb

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages