Skip to content

IARCbioinfo/damage-estimator-nf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 

Repository files navigation

damage-estimator-nf

Nextflow pipeline to run "Damage Estimator"

Description

This tool estimate the DNA damage when the DNA is sequenced using Illumina plateform on paired-end mode. There are 3 steps (starting from an aligned bam file) :

  • Split the paired end reads into R1 and R2 using split_mapped_reads.pl (universal)
  • Estimate the damage across reads using estimate_damage.pl
  • Plot the result using R.

Cf. https://github.com/Ettwiller/Damage-estimator

Dependencies

  1. This pipeline is based on nextflow. As we have several nextflow pipelines, we have centralized the common information in the IARC-nf repository. Please read it carefully as it contains essential information for the installation, basic usage and configuration of nextflow and our pipelines.

  2. External software:

The tool writes in tmp folder so check that yours is specified in your .bash_profile (export TMPDIR=/data/tmp/, export TMP=/data/tmp)

You can avoid installing all the external software by only installing Docker. See the IARC-nf repository for more information.

Input

Type Description
bam folder Folder containing the bam files on which you want to run "Damage Estimator"

Parameters

  • Mandatory

Name Example value Description
--bam_folder PATH/FOLDER folder containing .bam and .bam.bai files on which to run "Damage Estimator" (bams should preferably be generated by bwa mapping of Illumina paired-end sequencing)
--de_path PATH/DE location of folder containing damage estimator files (.pl and .r)
--ref PATH/FILE genome of reference (fasta file)
  • Optional

Name Default value Description
--Q 0 Phred score quality threshold (Sanger encoding). Only keep the bases with a Q score above a given threshold
--mq 10 mapping quality. Only keep the reads that passes a given threshold
--max_coverage_limit 100 If a position has equal or more than MAX reads (R1 or R2), the position is not used to calculate the damage. This option is put in place in order to avoid high coverage regions of the genome being the main driver for the damage estimation program.
--min_coverage_limit 1 If a position has equal or less than MIN reads (R1 or R2), the position is not used to calculate the damage. This option is put in place in order to calculate damage only in on-target regions (in cases of enrichment protocol such as exome ....)
--qualityscore 30 Discard the match or mismatch if the base on a read has less than MIN base quality. Important parameters. The lower this limit is, the less the damage is apparent.

For exome bams, we recommend: --Q 20 --mq 20 --max_coverage_limit 300 --min_coverage_limit 30

Usage

nextflow run iarcbioinfo/damage-estimator.nf --bam_folder BAM/ --de_path /path/ --genome_ref ref.fasta

Output

Type Description
"SMR" file1 and file2 Intermediate mpileup files generated by samtools ("Split Mapped Reads") containing all the positions in the genome with at least one read. The file in -mpileup1 correspond to the first in paired reads and the file in -mpileup2 correspond to the second in paired reads.
Table 6 columns : [1] raw count of variant type [2] variant type (ex. G_T, G to T) [3] id (from the --id option) [4] frequency of variant [5] family (the variant type and reverse complement) [6] GIV-score .
Graph Representation of the table generated by plot_damage.R

Contributions

Name Email Description
VOEGELE Catherine voegelec@iarc.fr Developer

About

Nextflow pipeline to run "Damage Estimator"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published