Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MACSE to enable frame-shift detection for COI #758

Open
hjarnek opened this issue Jul 2, 2024 · 6 comments
Open

Add MACSE to enable frame-shift detection for COI #758

hjarnek opened this issue Jul 2, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@hjarnek
Copy link

hjarnek commented Jul 2, 2024

Description of feature

The current stop-codon detection in ampliseq could be improved and supplemented with frame-shift detection by implementing MACSE. There are existing Nextflow pipelines here. This is a step that is increasingly recommended for protein-coding marker genes.

@hjarnek hjarnek added the enhancement New feature or request label Jul 2, 2024
@d4straub
Copy link
Collaborator

d4straub commented Jul 3, 2024

Hi, thanks for the reference to MACSE. However, I do not understand how that tool could fit into the pipeline. We do not produce alignments. Could you elaborate?

@hjarnek
Copy link
Author

hjarnek commented Jul 3, 2024

@d4straub
MACSE is an aligner with many applications, one of which is to detect pseudogenes among protein-coding marker genes.

Excerpted from the paper:

The enrichAlignment subprogram can be used to sequentially add new DNA sequences to an existing alignment. Its input parameters allow defining criteria that the additional sequences should fulfil to be actually incorporated into the final alignment. For instance, sequences can be automatically discarded when, once aligned, they would contain a stop codon, too many gaps, or more than a given number of frameshifts. [...] This [...] is especially useful for metabarcoding projects based on markers such as the mitochondrial Cytochrome Oxidase subunit I (cox1) gene. This typically involves enriching a reference alignment containing sequences from databases such as BOLD or MIDORI with thousands of newly generated sequences.

Creating those reference alignments they talk about is already done (for COI, rbcL & matK). They are available for different genetic codes and taxonomic groups from here: https://www.agap-ge2pop.org/barcoding-alignments/

This approach would improve on ampliseq's stop codon filtering by automatically detecting the correct ORF, taking more genetic translation tables into account, and also detect putative nuclear mitochondrial pseudogenes (nuMTs) through frameshift and gap analysis. Quite a desirable upgrade for COI analyses.

@d4straub
Copy link
Collaborator

d4straub commented Jul 3, 2024

Thanks for the reply! Here are 3 more questions:

  • Does that mean MACSE could be used to filter ASV sequences before taxonomic classification for sequences that are likely COI or rbcL? Similar to Barrnap that is used to predict whether ASVs are ribosomal RNA sequences.
  • Would that replace or complement the existing filter via --filter_codons, --orf_start, --orf_end, --stop_codons?
  • Would you be willing to add filtering with MACSE to the pipeline? If yes, in case you need support, I can give you some starting ideas & docu

@hjarnek
Copy link
Author

hjarnek commented Jul 3, 2024

@d4straub

  • Yes, exactly
  • Most likely replace
  • Unfortunately I don't have any experience with Nextflow, and in all honesty not much time either. I use a custom pipeline atm, but would be pleased to transition to ampliseq given that it satisfies all my needs, so this was more of a way of highlighting this need. I would be happy to give feedback, but someone else is probably better suited to implement it.

@d4straub
Copy link
Collaborator

d4straub commented Jul 4, 2024

Thanks!
Because I am as well limited in time and have at the moment no projects relating to protein-coding marker genes, I cannot justify investing time into adding MACSE currently. As I said above, I could give advise and reviews if anyone wants to give it a shot.

@hjarnek
Copy link
Author

hjarnek commented Jul 5, 2024

Alright, I see. Let's keep it hanging for now then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants