Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determinism when resolving atom alternative locations #568

Open
nscorley opened this issue May 22, 2024 · 3 comments
Open

Determinism when resolving atom alternative locations #568

nscorley opened this issue May 22, 2024 · 3 comments
Assignees
Labels

Comments

@nscorley
Copy link

nscorley commented May 22, 2024

Problem description:

In many mmCIF files, atoms exist equally in two locations, each with an occupancy of 0.5. For example, this screenshot from PDB ID 1adl:
image

Currently, when calling get_structure() with altloc=occupancy, there's some instability present in which altloc is chosen for a given residue in such situations. Sometimes the A conformation is chosen; sometimes the B conformation. From what I can tell, the lack of determinism may arise from the use of a set to store altlocs within the filter_highest_occupancy_altloc function here.

Although either choice is theoretically correct, such instability makes results much more challenging to debug.

Proposed solution:

Change the implementation of filter_highest_occupancy_altloc to be deterministic when two altlocs have equal occupancy; e.g., use a sorted list() rather than a set. Given the length of typical altloc sets, there will be minimal to no performance impact.

@padix-key padix-key added the bug label May 23, 2024
@padix-key
Copy link
Member

Hi, thanks for pinpointing the problem. I think your analysis is correct. If I remember correctly, the set ensures that each altloc ID only appears once, but there is no reason to not enveloping the set with sorted().

Would you like to create a fix?

@nscorley
Copy link
Author

Hi,

Yes, we definitely need the set() since we're storing one altloc ID per atom, and thus have duplicates within a residue. Enveloping with sorted() should be a quick fix. I'll make a PR shortly! Thanks for your quick response!

@padix-key
Copy link
Member

Do you still plan to create a PR? Otherwise I would prepare the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants