Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PandasPdb.gyradius #118

Merged
merged 4 commits into from
Apr 1, 2023
Merged

Add PandasPdb.gyradius #118

merged 4 commits into from
Apr 1, 2023

Conversation

goniochromatic
Copy link

Code of Conduct

Description

Added a new method gyradius in PandasPdb that computes the radius of gyration of a molecule, along with some tests for it

Related issues or pull requests

Fixes #9

Pull Request Checklist

  • Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file (if applicable)
  • Added appropriate unit test functions in the ./biopandas/*/tests directories (if applicable)
  • Modify documentation in the corresponding Jupyter Notebook under biopandas/docs/sources/ (if applicable)
  • Ran PYTHONPATH='.' pytest ./biopandas -sv and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., PYTHONPATH='.' pytest ./biopandas/classifier/tests/test_stacking_cv_classifier.py -sv)
  • Checked for style issues by running flake8 ./biopandas

Copy link
Contributor

@a-r-j a-r-j left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Looks really great. Only a couple minor points from me :)

@@ -864,3 +864,29 @@ def to_pdb_stream(self, records: tuple[str] = ("ATOM", "HETATM")) -> StringIO:
output.write("\n")
output.seek(0)
return output

@staticmethod
def gyradius(df: pd.DataFrame) -> float:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this default to self.df["ATOM"]?

ppdb.gyradius(ppdb.df["ATOM"] seems a little clunky

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g.

if df is None: df = self.df["ATOM"]

Copy link
Author

@goniochromatic goniochromatic Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought that it might be better, I just imitated the usage of rmsd and made it a staticmethod as suggested in issue #9, but yes I can fix that

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I just remove the df parameter altogether and set df = self.df['ATOM']? Because with anything else other than 'ATOM' it won't work. 'ANISOU' and 'OTHERS' don't have coordinates, and 'HETATM' has atoms that are most likely not present in atomic_masses, and I don't know if anyone would even want to use the function for 'HETATM'. It could be made as a parameter that defaults to 'ATOM', but again, anything else doesn't really make sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think the way I would do it is to implement a standalone function and then have a method in the class that wraps that and calls it on self.df["ATOM"]. Having said that, I can imagine an edge case scenario where someone wants to compute gyradius on HETATMs (they're not always strictly non-protein atoms in pactice!)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I eventually added a records parameter so now you can specify if you want the rg for the "ATOM" record or "HETATM" or both. Not sure if atoms in HETATM would be relevant, but if they are for someone, more atom masses can be added to atomic_masses. There's also a decimals parameter that controls the rounding which is a a bit useless since anyone could probably use round(), but the final value had 14 decimals, so I thought that 4 is a good default for rounding and gives the option to change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, final suggestion then is to move the atomic masses into constants.py (see #119 ). That way it's easy to accomodate hetatms like you suggest.

I.e.

import biopandas

biopandas.ATOMIC_MASSES = {"Si": 100, "Mg: 200"}

# Compute gyraidus...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I add them to constats.py considering that you've added the file in your recent pull request? Do I have to clone your main fork and then make a PR, and once merged the changes will be included in your current PR to biopandas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you just make the file on this branch I think it should merge automatically. Otherwise I'll resolve the conflict on my branch

center_of_mass = (masses[:, None] * coords).sum(axis=0) / total_mass
distances = np.linalg.norm(coords - center_of_mass, axis=1)
rg = np.sqrt((distances ** 2 * masses).sum() / total_mass)
return round(rg, 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we make a presumption about the accuracy people may want this at.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I could add an extra argument that defaults at let's say 3 or 4? Again, looked at rmsd which rounds to 4, I guess I could add the extra argument for it too.

@@ -4,6 +4,17 @@ The CHANGELOG for the current development version is available at
[https://github.com/rasbt/biopandas/blob/main/docs/sources/CHANGELOG.md](https://github.com/rasbt/biopandas/blob/main/docs/sources/CHANGELOG.md).


### 0.5.2 (27-03-2023)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.4.1 is still the latest release so these recent changes should actually go under 0.5.0dev

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok yeah mb I'll change that

@a-r-j a-r-j merged commit 927b6ea into BioPandas:main Apr 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding a function for computing the radius of gyration
2 participants