You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pdb=read_pdb(pdb_file="AF-Q05655-F1-model_v2.pdb", category_names=['_atom_site']) # We use '_atom_site' here to mirror the mmCIF format and it is the defaultatoms_df=pdb['_atom_site']
# Get values for residue_namelist(atoms_df.residue_name.unique())
Yes, it is annoying when we try to do filtering using residue_name.
I implemented that way simply because of performance consideration since we don't need to trim the extra space for every residue name while reading lines. If we use the PDBDataFrame class instead of the raw one to do filtering, it is more convenient and less confusing. The API is like this: https://moldf.readthedocs.io/en/latest/api.html#moldf.pdb_dataframe.PDBDataFrame.residue_names
To avoid the confusion, we can either sacrifice a little bit of reading performance by adding a keyword to the read_pdb function so that a residue_name column can have the compact version by default, or we can directly return a PDBDataFrame object instead of the base class. For the latter, we need to polish the code more before we can confidently use it.
I have encountered an issue when reading in a protein PDB file where whitespace is not effectively removed.
Source: Q05655
Using the following code:
This yields:
This whitespace should be trimmed so that filtering can take place properly.
Happy to submit a PR for this.
The text was updated successfully, but these errors were encountered: