choose alt when possible, even if ref more frequent #3298
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changelog Entry
To be copied to the draft changelog by merger:
vg deconstruct
no longer prints sites without altsDescription
Every now and then
vg deconstruct
can print a site with only reference alleles (and AC=0), which doesn't make much sense. I think @ekg noticed this on his pggb output too.I checked and what's going on is related to the heuristic used to choose a single allele for a haplotype that (due to loops) runs through multiple alleles in the snarl: It chooses the most frequent allele and, in the case of a tie, takes the alt. So if a haplotype goes through an alt once and the ref twice, it will choose ref and it's possible that no sample takes the alt leading to one of these sites.
This PR changes it to always choose the alt in the event of a conflict, which I think is probably more intuitive as the alt is present in the haplotype.