Skip to content

Remove host reads from a microbial pathogen sequence dataset by aligning against a reference genome. Verify that the reads were removed using Kraken2/Bracken

Notifications You must be signed in to change notification settings

BCCDC-PHL/dehost-and-verify-illumina

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

De-Host And Verify for Illumina

The purpose of this pipeline is to remove host-associated reads from a pathogen sequencing project. This may be done to meet privacy and ethics requirements when analyzing samples that have been collected from human hosts, or as a quality control process to ensure that host-derived reads do not interfere with downstream processes such as genome assembly.

This pipeline performs the following tasks:

  1. Estimates the relative abundances of reads originating from a host organism and a pathogen of interest using kraken2 and bracken
  2. Performs 'de-hosting' by aligning reads against a reference genome using bwa, and extracting unmapped reads using samtools
  3. Verifies that the remaining reads do not contain host-derived sequenced by again estimating relative abundances using the same method as in step 1.
  4. Prepares a brief report summarizing the efficacy of the de-hosting process.

workflow.png

Parameters

Parameter Default Value Description
kraken2_db Path to kraken2 database
bracken_db Path to bracken database
host_reference Path to host reference genome
taxonomy_level 'S' Taxonomic level at which to group reads ('S' = Species)
read_length 150 Input sequence read length. Must match bracken database.
host_name 'Homo sapiens' Name of host. Must match name in kraken2 database.
pathogen_name 'Severe acute respiratory syndrome-related coronavirus' Name of pathogen of interest. Must match name in kraken2 database.

Preparing Reference Datasets

Build a kraken2 database.

kraken2-build --db <path_to_kraken2_db> --standard

Build the bracken database.

bracken-build -d <path_to_kraken2_db> -l <read_length>

Index the host reference genome.

bwa index <path_to_host_reference>

Outputs

About

Remove host reads from a microbial pathogen sequence dataset by aligning against a reference genome. Verify that the reads were removed using Kraken2/Bracken

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published