Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

choose alt when possible, even if ref more frequent #3298

Merged
merged 2 commits into from
May 24, 2021
Merged

Conversation

glennhickey
Copy link
Contributor

Changelog Entry

To be copied to the draft changelog by merger:

  • vg deconstruct no longer prints sites without alts

Description

Every now and then vg deconstruct can print a site with only reference alleles (and AC=0), which doesn't make much sense. I think @ekg noticed this on his pggb output too.

I checked and what's going on is related to the heuristic used to choose a single allele for a haplotype that (due to loops) runs through multiple alleles in the snarl: It chooses the most frequent allele and, in the case of a tie, takes the alt. So if a haplotype goes through an alt once and the ref twice, it will choose ref and it's possible that no sample takes the alt leading to one of these sites.

This PR changes it to always choose the alt in the event of a conflict, which I think is probably more intuitive as the alt is present in the haplotype.

@glennhickey glennhickey merged commit e0b25dc into master May 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants