[Feature] Add support for Sourmash #114

Midnighter · 2023-07-02T14:01:10Z

I think sourmash is an interesting tool, as it is so fast in scanning vast libraries of genomes. We should add support for its output.

chrisgulvik · 2024-06-14T14:02:56Z

@Midnighter Any progress on this? I'm also interested in having it for evaluation.

According to the docs sourmash tax is the recommended approach (not sourmash lca anymore) link. I don't have example commands to use, but there's a fairly recent nf wf that might be helpful if it's the cmds themselves is what's slowing you down here. It looks like the main steps are:

sketch the input to form a sig (sourmash sketch) here
search the sig against a db (sourmash gather) here
summarize results by lineage (sourmash tax metagenome) here
annotate results (sourmash tax annotate) here

where steps 3 and 4 could occur in parallel.

The bioconda is up-to-date here, databases are well-described here, and the software itself is very well maintained by @ctb et al. for almost a decade now. Also including him to give an opportunity to suggest alternative cmds for generalized classification, in case the above steps are less than ideal.

ctb · 2024-06-14T14:04:36Z

I STAND READY

😆

ctb · 2024-06-14T14:07:21Z

can anyone give an example of one or two use cases so I can read the docs a bit with that in mind? would the standardize command be a good place to start?

might be fun to add sylph support as well, since people are liking that a lot (I'm not a maintainer - that would be @bluenote-1577)

Midnighter · 2024-06-14T14:41:06Z

Thank you for your interest @chrisgulvik 🙂. As this is taxpasta and not the taxprofiler pipeline, the exact commands actually don't matter in this context. The only thing required from a technical perspective are examples of a few profiles created with sourmash and maybe a clear understanding what variation in terms of column output is possible/desirable/supportable.

The major impediment is my time really, as I have moved into a different job, and taxpasta is now essentially a hobby project among (several) others. We have a fairly decent guide for how to add support for new types of profiles (https://taxpasta.readthedocs.io/en/latest/contributing/supporting_new_profiler/), so if you want to give it a shot, I'm happy to provide guidance and review code.

jfy133 · 2024-06-14T18:58:28Z

Agreed! A sourmash subwork was actually already started on the taxprofiler repo (it's in a draft state at the moment), but the person taking that on seems to have not been able to finish it. On 'our side's we normally we add tools to taxpasta once it's in the pipeline as then we know exactly what is available etc.

That said I'm also happy to guide on the taxprofiler/nextflow side of things (I'm still on half tjme parental leave until August) , if someone wants to take over the half done subworkflow! We have a profiler -contribution guide for that too

And agreed sylph also looks very interesting 👍

Midnighter self-assigned this Jul 2, 2023

Midnighter added the enhancement New feature or request label Jul 2, 2023

Midnighter mentioned this issue Jun 14, 2024

Add support for the sylph profiler nf-core/taxprofiler#497

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add support for Sourmash #114

[Feature] Add support for Sourmash #114

Midnighter commented Jul 2, 2023

chrisgulvik commented Jun 14, 2024

ctb commented Jun 14, 2024

ctb commented Jun 14, 2024

Midnighter commented Jun 14, 2024

jfy133 commented Jun 14, 2024

[Feature] Add support for Sourmash #114

[Feature] Add support for Sourmash #114

Comments

Midnighter commented Jul 2, 2023

chrisgulvik commented Jun 14, 2024

ctb commented Jun 14, 2024

ctb commented Jun 14, 2024

Midnighter commented Jun 14, 2024

jfy133 commented Jun 14, 2024