Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add support for Sourmash #114

Open
Midnighter opened this issue Jul 2, 2023 · 5 comments
Open

[Feature] Add support for Sourmash #114

Midnighter opened this issue Jul 2, 2023 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@Midnighter
Copy link
Contributor

I think sourmash is an interesting tool, as it is so fast in scanning vast libraries of genomes. We should add support for its output.

@Midnighter Midnighter self-assigned this Jul 2, 2023
@Midnighter Midnighter added the enhancement New feature or request label Jul 2, 2023
@chrisgulvik
Copy link

@Midnighter Any progress on this? I'm also interested in having it for evaluation.

According to the docs sourmash tax is the recommended approach (not sourmash lca anymore) link. I don't have example commands to use, but there's a fairly recent nf wf that might be helpful if it's the cmds themselves is what's slowing you down here. It looks like the main steps are:

  1. sketch the input to form a sig (sourmash sketch) here
  2. search the sig against a db (sourmash gather) here
  3. summarize results by lineage (sourmash tax metagenome) here
  4. annotate results (sourmash tax annotate) here

where steps 3 and 4 could occur in parallel.

The bioconda is up-to-date here, databases are well-described here, and the software itself is very well maintained by @ctb et al. for almost a decade now. Also including him to give an opportunity to suggest alternative cmds for generalized classification, in case the above steps are less than ideal.

@ctb
Copy link

ctb commented Jun 14, 2024

I STAND READY

😆

@ctb
Copy link

ctb commented Jun 14, 2024

can anyone give an example of one or two use cases so I can read the docs a bit with that in mind? would the standardize command be a good place to start?

might be fun to add sylph support as well, since people are liking that a lot (I'm not a maintainer - that would be @bluenote-1577)

@Midnighter
Copy link
Contributor Author

Thank you for your interest @chrisgulvik 🙂. As this is taxpasta and not the taxprofiler pipeline, the exact commands actually don't matter in this context. The only thing required from a technical perspective are examples of a few profiles created with sourmash and maybe a clear understanding what variation in terms of column output is possible/desirable/supportable.

The major impediment is my time really, as I have moved into a different job, and taxpasta is now essentially a hobby project among (several) others. We have a fairly decent guide for how to add support for new types of profiles (https://taxpasta.readthedocs.io/en/latest/contributing/supporting_new_profiler/), so if you want to give it a shot, I'm happy to provide guidance and review code.

@jfy133
Copy link
Contributor

jfy133 commented Jun 14, 2024

Agreed! A sourmash subwork was actually already started on the taxprofiler repo (it's in a draft state at the moment), but the person taking that on seems to have not been able to finish it. On 'our side's we normally we add tools to taxpasta once it's in the pipeline as then we know exactly what is available etc.

That said I'm also happy to guide on the taxprofiler/nextflow side of things (I'm still on half tjme parental leave until August) , if someone wants to take over the half done subworkflow! We have a profiler -contribution guide for that too

And agreed sylph also looks very interesting 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants