Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infer read layout generically #89

Open
uniqueg opened this issue Jul 5, 2022 · 2 comments
Open

Infer read layout generically #89

uniqueg opened this issue Jul 5, 2022 · 2 comments
Labels
enhancement New feature or request future Will not be worked on for now read layout Infer read architecture

Comments

@uniqueg
Copy link
Member

uniqueg commented Jul 5, 2022

Is your feature request related to a problem? Please describe.

Currently, only 3' adapters are inferred and these are identified via scanning for the presence of a list of known adapters. Devise and implement a solution that generically infers complex read layout patterns, including 5' adapters, UMIs and other linkers, other fixed length/sequence adapters etc.

Describe the solution you'd like

TBD

Describe alternatives you've considered

TBD

Additional context

N/A

@uniqueg uniqueg added read layout Infer read architecture meta Meta issue (spawns additional issues) labels Jul 5, 2022
@rohank63
Copy link
Collaborator

rohank63 commented Nov 1, 2022

Can we add a exhaustive list for the 3' adapters ?
Regarding the 5' adapters , just a question ( may be it doesn't make sense, as not much background in Bio )
Can't we reverse the read and then lookout for 5' adapters that are available ?

@uniqueg
Copy link
Member Author

uniqueg commented Nov 1, 2022

Hi @rohank63, I'm afraid it's not so easy. First of all, there is no exhaustive list of adapters, because, in principle, nothing stops people from using any adapter they like. In practice though, most people will use kits to prepare libraries, and those libraries tend to always use the same 3' adapters. And those ones are already in our list, most likely.

For 5' the situation is probably similar (there are kits, and those kits tend to use a finite set of adapters), but the adapters will likely be completely different. A problem here is that there aren't many kits that use 5' adapters that end up in the sequenced reads at all, so we would first need to explore kits and find libraries that have them.

However, even if we do that, there is basically an infinite space of read layouts of fixed, variable or random sequences that can occur in fixed, variable or random positions. There are UMIs and all sorts of other linkers that may be ligated to read fragments, at the start or end of reads or anywhere in between. It might be a single A at the start, a set of 3-6 random nucleotides at the end, a fixed sequence of reads followed by a random UMI and so on and so on.

So basically, if we want to improve on the read layout side of things, we should come up with a general mechanism of identifying those portions of reads that originate from actual transcripts.

@mzavolan had devised a strategy how this could potentially be done, and there is a number of closed issues labeled read_layout that all would need to be implemented for this functionality.

@uniqueg uniqueg added future Will not be worked on for now enhancement New feature or request and removed meta Meta issue (spawns additional issues) labels Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request future Will not be worked on for now read layout Infer read architecture
Projects
None yet
Development

No branches or pull requests

2 participants