Skip to content

IARCbioinfo/PVAmpliconFinder

Repository files navigation

PVAmpliconFinder

Robitaille, A., Brancaccio, R.N., Dutta, S. et al. PVAmpliconFinder: a workflow for the identification of human papillomaviruses from high-throughput amplicon sequencing. BMC Bioinformatics 21, 233 (2020). https://doi.org/10.1186/s12859-020-03573-8

Description

PVampliconFinder is a data analysis workflow designed to rapidly identify and classify known and potentially new papilliomaviridae sequences from amplicon deep-sequencing with degenerated papillomavirus (PV) primers.

PVampliconFinder is based on alignment similarity metrics, but also consider molecular evolution time for an improved identification and taxonomic classification of novel PVs. The final output of the tool includes a list of fully characterized putatively new papillomaviriade sequences, as well as graphical representations of relative abundance of the virome sequence diversity in the tested samples.

Prerequisites

The PVampliconFinder workflow is designed for the analysis of sequencing reads generated from paired-end sequencing of DNA amplified using degenerated primers targeting specifically the L1 sequence of papillomaviruses (Chouhy et al., 2010,Forslund et al., 1999,Forslund et al., 2003).

Installation

Python2.7 or higher and Perl v5.22.1 or higher are required.

The tool has been created under UNIX environment, but installing clang_osx-64, clangxx_osx-64 and gfortran_osx-64 with conda should provide a functional environment on Mac.

Automatic installation

PVAmpliconFinder rely on Bioconda to install the software and associated dependencies

Please install the version of Miniconda corresponding to your python version

Add conda channel

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

Install conda packages

conda install -y fastqc multiqc trim-galore vsearch blast raxml cap3 krona libxml2 gcc_linux-64 gxx_linux-64 gfortran_linux-64 perl-padwalker perl-xml-libxml perl-libxml-perl perl-bioperl perl-getopt-long perl-math-round perl-statistics-basic perl-list-moreutils perl-module-build perl-bioperl-run perl-text-csv

Add PaPaRa program to PATH

export PATH="PATH_TO_PVAMPLICONFINDER/program:$PATH"

For 32bits system, PaPaRa available binary file is not functionnal, as specified on the webpage of the tool. You need to install manually PaPara following the instruction, and put the binary file in PVAmpliconFinder/program. Note that the binary file must be named "papara".

Manual installation

The list of tools used by PVAmpliconFinder can be manually downloaded and installed, and corresponding executable must be present in the PATH environment variable.

Please note that PaPaRa binary file must be named "papara".

List of software

Databases

NCBI databases

PVAmpliconFinder need the nt and taxdb NCBI databases to work properly. You can find thoses databases at the following ftp adress : ftp://ftp.ncbi.nlm.nih.gov/blast/db/. Note that the taxonomy file must be correctly located.

It is advised to use the NCBI script update_blastdb.pl to facilitate the installation of the databases. More information here.

Once downloaded and installed, please check that the ~/.ncbirc file is present and point to the correct NCBI nt database location. More information here.

List of other databases

Input

Type Description
-d PATH to input fastq directory

tests files can be found here

Parameters

  • Mandatory

Name Example value Description
-s pool suffix of fastq filename
-o PV_Amplicon_output PATH to output directory
  • Optional

Name Default value Description
-f NA Tabular file containing information about the samples - The first line of this file must be "ID primer tissue"
-b nt Name of the local "nt" blast database
-i 98 Threshold of percentage of identity used for the de-novo centroid-based clustering
-t 2 Number of threads
  • Flags

Flags are special parameters without value.

Name Description
-h Display help

Usage

sh PVAmpliconFinder.sh [-h] [-t threads] [-b "nt" database] [-f info_file] [-i identity thershold] -s fastq_files_suffix -d input_dir -o output_dir

Output

Type Description
QC report Report on FastQ file quality, before and after trimming
Diversity by tissu Excel table of taxonomically classified PV species identified in the samples
Table summary Excel table of reads metics
Table putative Known viruses Excel table of putative known viruses identified in the samples
Table putative New viruses Excel table of putative new viruses identified in the samples
Putative Known viruses Fasta files of putative known viruses ssequences identified in the samples
Putative New viruses Fasta files of putative new viruses ssequences identified in the samples
KRONA Megablast Directory of KRONA graphical representations of the unormalized abundance of viruses identified by Megablast in the samples
KRONA BlastN Directory of KRONA graphical representations of the unormalized abundance of viruses identified by BlastN in the samples
KRONA RaxML Directory of KRONA graphical representations of the unormalized abundance of viruses identified by RaxML-EPA in the samples
Log file File of the logs

Detailed description of the output

Detailed description of the output

Contributions

Name Email Description
Alexis Robitaille alexis.robitaille@orange.fr Developer to contact for support
Magali Olivier olivierm@iarc.fr
Massimo Tommasino tommasinom@iarc.fr

Versioning

Version 1.0

Authors

License

This project is licensed under GPL-3.0.

Acknowledgments

References

References

FAQ

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published