Skip to content

Pre‐built databases

Jim Shaw edited this page Sep 5, 2024 · 17 revisions

Pre-sketched databases available for download below. All databases work from sylph version 0.3.x onwards.

  • Use the http://faust.compbio.cs.cmu.edu links if possible. We provide mirrors on google cloud, but this costs us more money.

Example usage:

wget http://faust.compbio.cs.cmu.edu/sylph-stuff/gtdb-r220-c200-dbv1.syldb
sylph profile my_sample.sylsp v0.3-c200-gtdb-r214.syldb -t 30 > results.tsv

GTDB Databases

GTDB r220 database (113,104 species representative genomes) - 24th April, 2024

  1. -c 200, more sensitive database (13.1 GB)
  2. -c 1000 more efficient, less sensitive database (2.6 GB)

GTDB r214 database (85,202 species representative genomes) - 28th April, 2023

  1. -c 200, more sensitive database (10 GB)
  2. -c 1000 more efficient, less sensitive database (2 GB)

Other prokaryotic databases

  1. OceanDNA catalogue of 8,466 ocean prokaryotic MAGs, -c 200 (800 MB)
  2. SMAG catalogue of soil 21,077 soil MAGs, -c 200 (2.5 GB):
  3. UHGG v2.0.1 catalogue of 289,232 gut genomes. Not dereplicated. Do not use for profiling. -c 200 (26 GB):

Viral databases

Pre-sketched IMG/VR4.1 database for high-confidence vOTU representatives (2,917,516 viral genomes).

  1. -c 200 (2GB)

Eukaryotic databases.

  1. 595 representative RefSeq fungi genomes (downloaded 2024-07-25), -c 200 (700 MB)

  2. 713 TARA Oceans eukaryotic MAGs/SAGs from Delmont et al., -c 200 (900 MB)

Taxonomy usage:

Some of the databases have associated taxonomies that sylph can utilize. See https://github.com/bluenote-1577/sylph-utils for more information.