Skip to content

Latest commit

 

History

History

assembly

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Pipeline overview

rulegraph

Prerequisites

To run the whole pipeline you will need the following programs installed:

Decontamination

During the workflow, contamination is removed using a combination of centrifuge and a custom script, remove_contamination.py. This pipeline assumes that a centrifuge database already exists. If you don't have one then you can build/download one by following the instructions here.
The centrifuge database I use internally for this was built on 28/07/2019 as part of a pipeline for another project. The code used to generate this database can be found here.

In remove_contamination.py, a requested parameter is --taxtree. I have provided one of these files in resources/taxonomy/mtbc.taxonlist for the Mycobacterium tuberculosis Complex (NCBI:txid77643). To create your own taxtree you will either need to create a file with a taxon ID you class as not contamination on each line, or generate one using taxonkit. For example, to create the MTBC one, I ran the following command:

mtbc_taxid=77643
taxonkit list --show-name --show-rank --ids "$mtbc_taxid" > mtbc.taxonlist

For more information about this command, refer to the documentation.

Results

Refer to the assessment notebook for plots.

Ultimately, for the work in this project, we use the PacBio assemblies generated by Flye, without any polishing with Illumina.