Duplex Sequencing Pipeline

A snakemake pipeline for obtaining duplex consensus reads generated using a duplex sequencing protocol (e.g. NanoSeq. The pipeline is similar to the recommend workflow from IDT on processing sequence data with unique molecular identifiers. QC is also performed, both pre- and post-duplex consensus calling.

Installation

The only prerequisite is snakemake. To install snakemake, you will need to install a Conda-based Python3 distribution. For this, Mambaforge is recommended. Once mamba is installed, snakemake can be installed like so:

mamba create -c conda-forge -c bioconda -n snakemake snakemake

Now activate the snakemake environment (you'll have to do this every time you want to run the pipeline):

conda activate snakemake

Now clone the repository:

git clone https://github.com/WEHIGenomicsRnD/duplex-seq-pipeline.git
cd duplex-seq-pipeline

Configuration

The configuration file is found under config/config.yaml and the config file for FastQ Screen is found under config/fastq_screen.conf. Please carefully go through these settings. The main settings to consider will be

read_structure -- ensure that this matches the UMI design of your experiment. Refer to fgbio's ExtractUmisFromBam for details on how to set this parameter.
umis -- if your UMIs are known (non-random), you can add a path to a UMIs text file (one UMI per line). Specifying a file path for this parameter will trigger a UMI correction step.
ref -- ensure you have downloaded and specified the correct reference for your data.

Running

Run the pipeline as follows:

conda activate snakemake
snakemake --use-conda --conda-frontend mamba --cores 1

If you want to submit your jobs to the cluster using SLURM, use the following to run the pipeline:

conda activate snakemake
snakemake --use-conda --conda-frontend mamba --profile slurm --jobs 8 --cores 24

The pipeline will generate all results under a results directory. The most relevant directories are:

results/QC/multiQC -- contains the multiQC report for pre-consensus call reads. Also contains FastQC and FastQ Screen metrics on the raw reads.
results/QC/consensus/multiQC -- contains the multiQC report generated on reads after duplex consensus calling.
results/consensus/{sample}__mapped_merged_filtered_clipped.bam -- contains the mapped, filtered and clipped consensus reads. These should be your "final" read alignments for duplex consensus reads.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
config		config
slurm		slurm
workflow		workflow
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Duplex Sequencing Pipeline

Installation

Configuration

Running

About

Releases

Packages

Languages

License

WEHIGenomicsRnD/duplex-seq-pipeline

Folders and files

Latest commit

History

Repository files navigation

Duplex Sequencing Pipeline

Installation

Configuration

Running

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages