Before you download Arche (13Gb), make sure the following software are working properly on your computer:
p7zip-full
bedtools >= 2.27.0
barrnap
Prodigal
hmmer
blastp
DIAMOND
FASTA36 ---> https://github.com/wrpearson/fasta36.git
faSomeRecords ---> The version which works with Arche can be found here http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/
Infernal 1.1.4 ---> Install manually!
tar xvfz infernal-1.1.4.tar.gz
cd infernal-1.1.4/
./configure
make
sudo make install
tRNAscan-SE ---> Install manually!
Download it from http://lowelab.ucsc.edu/tRNAscan-SE/
tar xvfz trnascan-se-2.0.9.tar.gz
cd tRNAscan-SE-2.0
make
sudo make install
GeneMarkS-2
Download GeneMarkS-2 and key from http://exon.gatech.edu/GeneMark/license_download.cgi
gunzip gm_key_64.gz
tar xvfz gms2_linux_64.tar.gz
cp gm_key_64 ~/.gmhmmp2_key
The program with the already formatted databases and mapping files can be downloaded via GUI from Google Drive:
https://drive.google.com/file/d/1galIzwiuxXc3rNKyhoTysnl7DMdh3p4Z/view?usp=sharing
... or via command line using gdown:
pip3 install gdown
gdown --fuzzy https://drive.google.com/file/d/1x9caXGPpYXCHUoodOdnuJI0tCDe9qtGG/view?usp=sharing
Once the download is finished:
tar -xvf arche_1.0.1.tar (move the output directory to the desired place)
cd arche_1.0.1/bin/
chmod +777 arche.sh
./arche.sh --install
You should make the script "arche.sh" accessible to your PATH, for example via symbolic link:
cd /usr/bin
sudo cp -s /home/???/???/arche_1.0.1/bin/arche.sh ./
ATTENTION! If something fails, you should check that all the dependencies are working, and repeat the installation process from the begining (arche_X.X.X.tar file).
arche.sh -n ecoli -t 20 -r 40 e_coli.fna
arche.sh -n halorubrum -a ssearch -k achaea halorubrum_sp_DM2.fa
arche.sh -n seawater_meatgenome -k meta seawater_metagenome.fna
Here you can download a sample which includes the annotation of Escherichia coli K12 with several tools including Arche:
File(s) | Description |
---|---|
rRNA.tsv | GFF v3 file containing rRNA annotations. |
rRNA.fna | FASTA file of all rRNA features. |
tRNA.tsv | Table with tRNA details (coordinates, isotype, anticodon, scores, etc). |
[...]_struc_annot.fna | FASTA file of all genomic features (nucleotide). |
[...]_struc_annot.faa | FASTA file of translated coding genes (aminoacid). |
heuristic[...]_out | Output matches of the search instance(s) performed with BLASTp, DIAMOND or SSEARCH36. |
heuristic[...]_non_match.faa | FASTA file with the remaining non-matched sequences after the search instance(s) performed with BLASTp, DIAMOND or SSEARCH36. |
hmmscan_[...]_out | HMMER3 output table of the search instance(s) performed against a specific HMMDB. |
[HMMDB]_non_match.faa | FASTA file with the remaining non-matched sequences after the search instance performed against a specific HMMDB. |
[...]_omic_table.tbl | Feature table with fields separated by vertical bars. |
[...]_omic_table.tsv | Feature table with tab-separated fields. |
arche_report | File which includes the parameters of the run and results. |
-h, --help This help.
-i, --install Set up the executable location, and install databases.
-n, --name-files Name of the files to be created in the output directory, in-
cluding the directory itself (default 'arche').
-o, --output Provide the full path to the directory where the output di-
rectory will be created. E.g. /home/user/ (default current).
-k, --kingdom Source of the contigs. Use 'arch' for archaeal genomes or
'meta' for metagenomes (default is for bacterial genomes).
-m, --mode Gives priority to Orthology (KO, eggNOG) or Enzyme Comission
designed databases during the annotation. Use 'kegg' for KO-->
eggNOG-->E.C., 'eggnog' for eggNOG-->KO-->E.C., or 'ec' for
E.C.-->KO-->eggNOG (default will use a shorter swiss-prot KO·
·eggNOG·E.C. designed database with no priority).
-a, --alignment Select the algorithm to use during the protein alignment step:
'diamond' (accelerated blastp) or 'ssearch' (Smith-Waterman)
(default 'blastp').
-t, --threads Number of threads to use (default '1').
-r, --memory Amount of RAM to use in GB (default '2').
-e, --evalue Similarity e-value cut-off (default '1e-08').
-q, --query-cov Minimum coverage on query protein (default '70').
-b, --bypass Use 'yes' to bypass the RNA gene prediction.
-v, --verbose Use 'yes' to turn on the verbose mode.
- Daniel Alonso
- email: gundizalvus16@hotmail.com