Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing bonds if use_author_fields=False when two same CCD together in wwPDB .cif file #553

Open
0ut0fcontrol opened this issue Apr 15, 2024 · 3 comments
Labels

Comments

@0ut0fcontrol
Copy link
Contributor

5hu8 use_author_fields=True use_author_fields=False
/tmp/5hu8.cif /tmp/chain_j_author.mol /tmp/chain_j_label.mol
# biotite=0.40.0
import biotite.structure.io.pdbx as pdbx
from biotite.database import rcsb
import biotite.structure as struc
import biotite.structure.io as strucio

try:
    cif_file = pdbx.CIFFile.read(f"/tmp/5hu8.cif")
except:
    rcsb.fetch("5hu8", "cif", target_path="/tmp")
    cif_file = pdbx.CIFFile.read("/tmp/5hu8.cif")

# label_asym_id = author_asym_id = J
# NAG-NAG-FUC
structue_author = pdbx.get_structure(cif_file, model=1, use_author_fields=True, include_bonds=True)
chain_J_author = structue_author[structue_author.chain_id == "J"]
residue_num = struc.get_residue_count(chain_J_author)
print("author:", residue_num)
# output author: 3
strucio.save_structure("/tmp/chain_j_author.mol", chain_J_author)


structue = pdbx.get_structure(cif_file, model=1, use_author_fields=False, include_bonds=True)
chain_J = structue[structue.chain_id == "J"]
residue_num = struc.get_residue_count(chain_J)
print("mmcif label:", residue_num)
# mmcif label: 2
strucio.save_structure("/tmp/chain_j_label.mol", chain_J)
image

How to get bonds right when use_author_fields=False?

@0ut0fcontrol
Copy link
Contributor Author

0ut0fcontrol commented Apr 15, 2024

I want to use label_seq_id and bond order from mmcif.

replacing res_id with auth_seq_id in get_structure(), make bonds right when set use_author_fields=False.
image
It is ugly 😂.

Better way to handle this?

@padix-key
Copy link
Member

padix-key commented Apr 17, 2024

Hi, thanks for the report. I dived a bit into this problem and my conclusion is that if use_author_fields=False both NAGs are considered as the same residue: The start of a new residue is automatically detected when one of chain_id, res_id, ins_code or res_name changes. As you see in the CIF nothing of these points are true in this edge case. Hence, both NAGS are considered the same residue. As a consequence, only the first occurrence of each atom is connected to its corresponding bond partner.

I would consider this a bug in get_residue_starts(). However, this one is rather cursed to fix: We could add an additional condition, which checks if the same atom name appears again in a residue. However, this check would be rather computationally expensive, so I assume it would slow down a number of functionalities in Biotite, such as adding bonds to structures.

For now a solution for your concrete problem would be to use author_fields=True, and add the label_* fields of interest to extra_fields. If you want you can later overwrite the annotation arrays with the label_* ones, e.g.

atoms.chain_id = atoms.label_asym_id

@padix-key padix-key added the bug label Apr 17, 2024
@0ut0fcontrol
Copy link
Contributor Author

Your solution appears to be cleaner (with no biotite source code alterations). Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants