-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PandasPdb.gyradius #118
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Looks really great. Only a couple minor points from me :)
biopandas/pdb/pandas_pdb.py
Outdated
@@ -864,3 +864,29 @@ def to_pdb_stream(self, records: tuple[str] = ("ATOM", "HETATM")) -> StringIO: | |||
output.write("\n") | |||
output.seek(0) | |||
return output | |||
|
|||
@staticmethod | |||
def gyradius(df: pd.DataFrame) -> float: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this default to self.df["ATOM"]
?
ppdb.gyradius(ppdb.df["ATOM"]
seems a little clunky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g.
if df is None: df = self.df["ATOM"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought that it might be better, I just imitated the usage of rmsd
and made it a staticmethod as suggested in issue #9, but yes I can fix that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I just remove the df
parameter altogether and set df = self.df['ATOM']
? Because with anything else other than 'ATOM' it won't work. 'ANISOU' and 'OTHERS' don't have coordinates, and 'HETATM' has atoms that are most likely not present in atomic_masses
, and I don't know if anyone would even want to use the function for 'HETATM'. It could be made as a parameter that defaults to 'ATOM', but again, anything else doesn't really make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think the way I would do it is to implement a standalone function and then have a method in the class that wraps that and calls it on self.df["ATOM"]. Having said that, I can imagine an edge case scenario where someone wants to compute gyradius on HETATMs (they're not always strictly non-protein atoms in pactice!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I eventually added a records
parameter so now you can specify if you want the rg for the "ATOM" record or "HETATM" or both. Not sure if atoms in HETATM would be relevant, but if they are for someone, more atom masses can be added to atomic_masses
. There's also a decimals
parameter that controls the rounding which is a a bit useless since anyone could probably use round()
, but the final value had 14 decimals, so I thought that 4 is a good default for rounding and gives the option to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, final suggestion then is to move the atomic masses into constants.py
(see #119 ). That way it's easy to accomodate hetatms like you suggest.
I.e.
import biopandas
biopandas.ATOMIC_MASSES = {"Si": 100, "Mg: 200"}
# Compute gyraidus...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I add them to constats.py
considering that you've added the file in your recent pull request? Do I have to clone your main fork and then make a PR, and once merged the changes will be included in your current PR to biopandas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you just make the file on this branch I think it should merge automatically. Otherwise I'll resolve the conflict on my branch
biopandas/pdb/pandas_pdb.py
Outdated
center_of_mass = (masses[:, None] * coords).sum(axis=0) / total_mass | ||
distances = np.linalg.norm(coords - center_of_mass, axis=1) | ||
rg = np.sqrt((distances ** 2 * masses).sum() / total_mass) | ||
return round(rg, 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we make a presumption about the accuracy people may want this at.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I could add an extra argument that defaults at let's say 3 or 4? Again, looked at rmsd
which rounds to 4, I guess I could add the extra argument for it too.
docs/CHANGELOG.md
Outdated
@@ -4,6 +4,17 @@ The CHANGELOG for the current development version is available at | |||
[https://github.com/rasbt/biopandas/blob/main/docs/sources/CHANGELOG.md](https://github.com/rasbt/biopandas/blob/main/docs/sources/CHANGELOG.md). | |||
|
|||
|
|||
### 0.5.2 (27-03-2023) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.4.1 is still the latest release so these recent changes should actually go under 0.5.0dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok yeah mb I'll change that
Code of Conduct
Description
Added a new method
gyradius
inPandasPdb
that computes the radius of gyration of a molecule, along with some tests for itRelated issues or pull requests
Fixes #9
Pull Request Checklist
./docs/sources/CHANGELOG.md
file (if applicable)./biopandas/*/tests
directories (if applicable)biopandas/docs/sources/
(if applicable)PYTHONPATH='.' pytest ./biopandas -sv
and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g.,PYTHONPATH='.' pytest ./biopandas/classifier/tests/test_stacking_cv_classifier.py -sv
)flake8 ./biopandas