Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple alignment manipulation #51

Open
BioTurboNick opened this issue May 18, 2021 · 3 comments
Open

Multiple alignment manipulation #51

BioTurboNick opened this issue May 18, 2021 · 3 comments

Comments

@BioTurboNick
Copy link

Need capability to manipulate a multiple sequence alignment, seems like the right place to put it.

I started working on this but it may need more thought about how it will play nice with the pairwise alignment-oriented code.

@kescobo
Copy link
Member

kescobo commented May 19, 2021

I haven't worked much with the AlignedSequence type, but it seems like there could be an AbstractAlignedSequence and a MultiAlignedSequences <: AbstractAlignedSequence. This might be another place where explicitly defining and documenting the expected API a la BioJulia/BioSequences.jl#140 would be useful.

I wonder if an MSA could be represented by a vector or Tuple of AlignedSequence though.

@BioTurboNick
Copy link
Author

Good ideas. It could be. I'm wondering though about the strong assumption in AlignedSequence that a sequence is aligned to a single known reference. That makes a lot of sense for aligning sequencing reads to a reference genome. Not as much if you're aligning orthologs.

Maybe AlignedSequence could just be extended to have a single-sequence constructor that just assumes a reference exists that matches in all locations and gaps are all deletions against it.

@kescobo
Copy link
Member

kescobo commented May 19, 2021

Well, something has to be the reference, right? It could just be a consensus sequence that's never directly observed, but short of actually storing every sequence, you need something that edits are defined against.

Thinking about it some more, I wonder if you could do something like

  1. when you first create an msa, you can either define a reference explicitly, or the reference is generated as a consensus sequence.
  2. if the msa is mutable, you can add additional sequences that are put in as edits against the existing reference
  3. you can call consensus!(msa) (or something) that updates the reference to the best consensus and re-calculates the edits against that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants