Arche: a functional-optimized annotator for microbial meta(genomes)

Installing dependencies

Before you download Arche (13Gb), make sure the following software are working properly on your computer:

p7zip-full
bedtools >= 2.27.0
barrnap
Prodigal
hmmer
blastp
DIAMOND

FASTA36 ---> https://github.com/wrpearson/fasta36.git

faSomeRecords ---> The version which works with Arche can be found here http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/

Infernal 1.1.4 ---> Install manually!
tar xvfz infernal-1.1.4.tar.gz
cd infernal-1.1.4/
./configure
make
sudo make install

tRNAscan-SE ---> Install manually!
Download it from http://lowelab.ucsc.edu/tRNAscan-SE/
tar xvfz trnascan-se-2.0.9.tar.gz
cd tRNAscan-SE-2.0
make
sudo make install

GeneMarkS-2
Download GeneMarkS-2 and key from http://exon.gatech.edu/GeneMark/license_download.cgi
gunzip gm_key_64.gz
tar xvfz gms2_linux_64.tar.gz
cp gm_key_64 ~/.gmhmmp2_key

Installing Arche

The program with the already formatted databases and mapping files can be downloaded via GUI from Google Drive:

https://drive.google.com/file/d/1galIzwiuxXc3rNKyhoTysnl7DMdh3p4Z/view?usp=sharing

... or via command line using gdown:

pip3 install gdown
gdown --fuzzy https://drive.google.com/file/d/1x9caXGPpYXCHUoodOdnuJI0tCDe9qtGG/view?usp=sharing

Once the download is finished:

tar -xvf arche_1.0.1.tar (move the output directory to the desired place)
cd arche_1.0.1/bin/
chmod +777 arche.sh
./arche.sh --install

You should make the script "arche.sh" accessible to your PATH, for example via symbolic link:

cd /usr/bin
sudo cp -s /home/???/???/arche_1.0.1/bin/arche.sh ./

ATTENTION! If something fails, you should check that all the dependencies are working, and repeat the installation process from the begining (arche_X.X.X.tar file).

Running Arche

BlastP annotation of a bacterial genome, using 20 threads and 40 GB of memory:

arche.sh -n ecoli -t 20 -r 40 e_coli.fna

SSEARCH annotation of an archaeal genome, using 1 thread and 2 GB of memory

arche.sh -n halorubrum -a ssearch -k achaea halorubrum_sp_DM2.fa

DIAMOND annotation of a metagenome

arche.sh -n seawater_meatgenome -k meta seawater_metagenome.fna

Annotation of Escherichia coli K12

Here you can download a sample which includes the annotation of Escherichia coli K12 with several tools including Arche:

https://docs.google.com/spreadsheets/d/17Nd_y7w2axfxsjFJYAvb_NI3AW9HjNx4/edit?usp=sharing&ouid=115908476093915484477&rtpof=true&sd=true

Output Files

File(s)	Description
rRNA.tsv	GFF v3 file containing rRNA annotations.
rRNA.fna	FASTA file of all rRNA features.
tRNA.tsv	Table with tRNA details (coordinates, isotype, anticodon, scores, etc).
[...]_struc_annot.fna	FASTA file of all genomic features (nucleotide).
[...]_struc_annot.faa	FASTA file of translated coding genes (aminoacid).
heuristic[...]_out	Output matches of the search instance(s) performed with BLASTp, DIAMOND or SSEARCH36.
heuristic[...]_non_match.faa	FASTA file with the remaining non-matched sequences after the search instance(s) performed with BLASTp, DIAMOND or SSEARCH36.
hmmscan_[...]_out	HMMER3 output table of the search instance(s) performed against a specific HMMDB.
[HMMDB]_non_match.faa	FASTA file with the remaining non-matched sequences after the search instance performed against a specific HMMDB.
[...]_omic_table.tbl	Feature table with fields separated by vertical bars.
[...]_omic_table.tsv	Feature table with tab-separated fields.
arche_report	File which includes the parameters of the run and results.

Command line options

-h, --help           This help.

-i, --install	     Set up the executable location, and install databases.

-n, --name-files     Name of the files to be created in the output directory, in-
		     cluding the directory itself (default 'arche').
             
-o, --output	     Provide the full path to the directory where the output di-
		     rectory will be created. E.g. /home/user/ (default current).
             
-k, --kingdom        Source of the contigs. Use 'arch' for archaeal genomes or
                         'meta' for metagenomes (default is for bacterial genomes).
                         
-m, --mode           Gives priority to Orthology (KO, eggNOG) or Enzyme Comission
                     designed databases during the annotation. Use 'kegg' for KO-->
                         eggNOG-->E.C., 'eggnog' for eggNOG-->KO-->E.C., or 'ec' for
                         E.C.-->KO-->eggNOG (default will use a shorter swiss-prot KO·
                         ·eggNOG·E.C. designed database with no priority).
                         
-a, --alignment      Select the algorithm to use during the protein alignment step:
                         'diamond' (accelerated blastp) or 'ssearch' (Smith-Waterman)
                         (default 'blastp').
                         
-t, --threads        Number of threads to use (default '1').

-r, --memory         Amount of RAM to use in GB (default '2').

-e, --evalue         Similarity e-value cut-off (default '1e-08').

-q, --query-cov      Minimum coverage on query protein (default '70').

-b, --bypass         Use 'yes' to bypass the RNA gene prediction.

-v, --verbose	     Use 'yes' to turn on the verbose mode.

Licence

GPL v3

Author

Daniel Alonso
email: gundizalvus16@hotmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
LICENSE		LICENSE
README.md		README.md
arche.sh		arche.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arche: a functional-optimized annotator for microbial meta(genomes)

Installing dependencies

Installing Arche

Running Arche

BlastP annotation of a bacterial genome, using 20 threads and 40 GB of memory:

SSEARCH annotation of an archaeal genome, using 1 thread and 2 GB of memory

DIAMOND annotation of a metagenome

Annotation of Escherichia coli K12

Output Files

Command line options

Licence

Author

About

Releases

Packages

Languages

License

gundizalv/Arche

Folders and files

Latest commit

History

Repository files navigation

Arche: a functional-optimized annotator for microbial meta(genomes)

Installing dependencies

Installing Arche

Running Arche

BlastP annotation of a bacterial genome, using 20 threads and 40 GB of memory:

SSEARCH annotation of an archaeal genome, using 1 thread and 2 GB of memory

DIAMOND annotation of a metagenome

Annotation of Escherichia coli K12

Output Files

Command line options

Licence

Author

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages