You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for writing taxor and including useful databases!
What the community (or maybe just me) really wants is a database that covers more of the microbial kingdom, but with the benefit of GTDB for the bacteria and archaea. Along with a human and synthetic sequence entries.
My dream database is:
Bacteria - GTDB
Archaea - GTDB
Virues - Refseq or Genbank
Fungi - Genbank
Protozoa - Genbank
Human - single or pangenome to catch host DNA
Artificial sequences - adaptors, vectors etc
I understand munging GTDB taxonomy with NCBI is a challenge, but do you think this database would be achievable?
Public health labs around the world would be grareful!
The text was updated successfully, but these errors were encountered:
I think this can be done relatively quickly. What Taxor needs is just a directory of Fasta files (one per species) and a metadata file that contains all the taxonomic information. I can create the files for GTDB and Genbank separately and finally merge them into just one file. I can also download the publicly available genomes (with accompanying metadata) using genome_updater. I just need more details about the artificial sequences you want to include in the database. If you can provide more information, I will create an index file for you with all the relevant genomes and taxonomy.
Ok it turns out it's not as easy as I initially thought. GTDB includes many different organisms with the same NCBI species taxid. This breaks Taxor's internal data model, which relies on a single unique taxid per species. This requires some refactoring of the code.
I'm also a bit skeptical whether the resolution of k-mer selection schemes like minimizers and syncmers is high enough to distinguish between species that are so close to each other that they have the same species taxid. Do you know how similar those species can be to each other?
Thanks for writing
taxor
and including useful databases!What the community (or maybe just me) really wants is a database that covers more of the microbial kingdom, but with the benefit of GTDB for the bacteria and archaea. Along with a human and synthetic sequence entries.
My dream database is:
I understand munging GTDB taxonomy with NCBI is a challenge, but do you think this database would be achievable?
Public health labs around the world would be grareful!
The text was updated successfully, but these errors were encountered: