Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add overlapping genes to standard prodigal-gv ORF detection #4

Open
bhagavadgitadu22 opened this issue Apr 3, 2024 · 1 comment
Open

Comments

@bhagavadgitadu22
Copy link

I am considering looking for overlapping genes in some viral metagenomes I obtained. I am not interested into the overlaps themselves but more into genes that I would have missed because of big overlaps. I feel like your tool might be a solution to complement the genes I detected with prodigal-gv or PHANOTATE but I do not really know how to interpret the results.

What I would like is to obtain a result file giving me a list of potential overlapping genes that were not detected previously with their coordinates and a likelihood that this is true gene. Do you know how I could go from OLGenie results to this list?

More generally, do you feel like incorporating overlapping genes in the context of metagenomics is doable? Cause metagenomic studies are not really looking for them.

@singing-scientist
Copy link
Contributor

Greetings! At first thought, the best way to use OLGenie for this sort of thing is to feed it alignments of ORFs and then choose genes that have very low dN/dS estimates as the likeliest candidates, i.e., the ORFs that are predicted to be under purifying selection aka constraint. Unfortunately OLGenie won't do the annotating of ORFs for you, but if you have lists of possible ORFs from another tool (e.g., ORFs of some minimum length you're interested in) you can (1) extract an alignment of the ORF from your sequence data and (2) feed it into OLGenie with the correct frame information.

Note that OLGenie and dN/dS in general is explicitly a comparative metric, i.e., it takes advantage of patterns of diversity and therefore needs an alignment of sequences with sufficient levels of diversity to estimate dN and dS. That probably fits your situation, but worth keeping in mind.

I haven't applied OLGenie to metagenomics myself but in principle it seems possible, just wondering how tractable depending on how much data. The nice thing is that, as a counting method, it'll be a lot faster than likelihood-based methods for this sort of high-throughput thing.

Let me know if any of that helps. Also consider contacting co-author Zachary Ardern who is doing high-throughput applications to bacterial genomes. (I'll try to get him on here as well!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants