Skip to content

Wrapper for short and long read mapping, creation of quality report(s) and estimation of genome size

License

Notifications You must be signed in to change notification settings

schellt/backmap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 

Repository files navigation

backmap.pl v0.5

Description

Automatic read mapping and genome size estimation from coverage.

Automatic mapping of paired, unpaired, PacBio and Nanopore reads to an assembly with bwa mem or minimap2, execution of qualimap bamqc and estimation of genome size from mapped nucleotides divided by mode of the coverage distribution (>0). This method was first pulished in Schell et al. (2017). To show high accuracy and reliability of this method throughout the tree of life, Pfenninger et al. (2021) published a study comparing different estimators. Currently, the estimator Nbm/m (number of back-mapped bases divided by the modal value of the sequencing depth distribution) is implemented in this script only.
The tools samtools, bwa and/or minimap2 need to be in your $PATH. The tools qualimap, multiqc, bedtools and Rscript are optional but needed to create the mapping quality report, coverage histogram as well as genome size estimation and to plot of the coverage distribution respectively.

Dependencies

backmap.pl needs the following perl modules and will search for executables in your $PATH:

Mandatory:

Short read mapping:

Long read mapping:

Optional:

Usage

backmap.pl [-a <assembly.fa> {-p <paired_1.fq>,<paired_2.fq> | -u <unpaired.fq>} |
            -pb <clr.fq> | -hifi <hifi.fq> | -ont <ont.fq> } | -b <mapping.bam>]

Mandatory:
	-a STR		Assembly were reads should mapped to in fasta format
	AND AT LEAST ONE OF
	-p STR		Two files with paired Illumina reads comma sperated
	-u STR		Fastq file with unpaired Illumina reads
	-pb STR		Fasta or fastq file with PacBio CLR reads
	-hifi STR	Fasta or fastq file with PacBio HiFi reads
	-ont STR	Fasta or fastq file with Nanopore reads
	OR
	-b STR		Bam file to calculate coverage from
			Skips read mapping
			Overrides -nh
			Technologies will recognized correctly if filenames end with
			.pb(.sort).bam, .hifi(.sort).bam or .ont(.sort).bam for PacBio CLR,
			PacBio HiFi and Nanopore respectively. Otherwise they are assumed to
			be from Illumina.
			
	All mandatory options except of -a can be specified multiple times

Options: [default]
	-o STR		Output directory [.]
			Will be created if not existing
	-t INT		Number of parallel executed processes [1]
			Affects bwa mem, samtools sort/index/view/stats, qualimap bamqc
	-pre STR	Prefix of output files if -a is used [filename of -a]
	-sort		Sort the bam file(s) (-b) [off]
	-nq		Do not run qualimap bamqc [off]
	-nh		Do not create coverage histogram [off]
			Implies -ne
	-ne		Do not estimate genome size [off]
	-kt		Keep temporary bam files [off]
	-bo STR		Options passed to bwa [-a -c 10000]
	-mo STR		Options passed to minimap [CLR: -H -x map-pb; HiFi:  minimap<=2.18
			-x asm20 minimap>2.18 -x map-hifi; ONT: -x map-ont]
	-qo STR		Options passed to qualimap [none]
	Pass options with quotes e.g. -bo "<options>"
	-v		Print executed commands to STDERR [off]
	-dry-run	Only print commands to STDERR instead of executing [off]

	-h or -help	Print this help and exit
	-version	Print version number and exit

Citation

Pfenninger M, Schönenbeck P & Schell T (2021). ModEst: Accurate estimation of genome size from next generation sequencing data. Molecular ecology resources, 00, 1–11. https://doi.org/10.1111/1755-0998.13570

Schell T, Feldmeyer B, Schmidt H, Greshake B, Tills O et al. (2017). An Annotated Draft Genome for Radix auricularia (Gastropoda, Mollusca). Genome Biology and Evolution, 9(3):585–592, https://doi.org/10.1093/gbe/evx032

If you use this tool please cite the dependencies as well:

About

Wrapper for short and long read mapping, creation of quality report(s) and estimation of genome size

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages