Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] RMSD for comparing small molecules #43

Open
ConorFWild opened this issue Jul 22, 2020 · 11 comments
Open

[Feature Request] RMSD for comparing small molecules #43

ConorFWild opened this issue Jul 22, 2020 · 11 comments

Comments

@ConorFWild
Copy link

Having a function to calculate the RMSD between small molecules whose atoms are not necessarily listed in the same order seems like it would be potentially very useful, although it would need some graph matching and symmetry shenanigans!

@wojdyr
Copy link
Member

wojdyr commented Oct 30, 2020

I added calculation of RMSD and the superposition for residue spans:
https://gemmi.readthedocs.io/en/latest/analysis.html#superposition
It does the sequence alignment to find matching residues, but then it is assumed that atoms with the same names are corresponding to each other (alternatively, only Cα atoms are used for the superposition).

When you wrote about small molecules - did you mean small molecules represented by SmallStructure or Residue?

@ConorFWild
Copy link
Author

I was thinking SmallStructure ^^

@wojdyr
Copy link
Member

wojdyr commented Nov 3, 2020

What pair of molecules could be used as an example?

@wojdyr
Copy link
Member

wojdyr commented Jan 23, 2021

I was about to add a function that superposes two Residues, but after thinking about – indeed, it'd be good to account for swapping of equivalent atoms, such as CG1 and CG2 in valine. Which would require iterating over graph automorphisms.
So I started searching for a C++ library that could do it and that could be integrated with gemmi. Apparently, there are two separate classes of algorithms we could use:

  • for graph isomorphism: codes such as nauty and traces, saucy, bliss, conauto. The first two are available (together) under the Apache license and these are the only ones in the list that would be license-compatible with gemmi. But it'd be rather a significant effort to integrate them with gemmi. Perhaps we could work with pynauty? I haven't investigated it. I suppose there are also codes I haven't come across. I see GraphSymmetryFinder class (based on the 2008 paper from saucy developers) in a bigger package.

  • for subgraph isomorphism: codes such as vf2/vf3 and RI. The latter is under the MIT license and it's a small, header-only C++ library. It could be a good fit. But using subgraph isomorphism for automorphism is probably far from optimal. I haven't found any benchmarks, though. The RI paper contains benchmarks of small molecule matching, but for subgraph matching.

I'm writing it all down to not forget it. Perhaps I'll get back to it at some point.
One question is how efficient a subgraph isomorphism algorithm is for finding automorphisms. Another – how hard it'd be to integrate it with gemmi and if it's worth the effort.

@CV-GPhL
Copy link
Contributor

CV-GPhL commented Jan 25, 2021 via email

@wojdyr
Copy link
Member

wojdyr commented Jan 25, 2021

but after thinking about – indeed, it'd be good to account for swapping of equivalent atoms, such as CG1 and CG2 in valine.

I would highly recommend not to allow swapping of these atoms - at least not by default: the resulting structure is different, i.e. the atom-name based conformational restraints will get broken. What you can do is allow for a 180-degree rotation for (a) the truly symmetrical side-chains (PHE, TYR),

I don't get it. How is the rotation different than swapping atoms in such case?
Are there any restraints that differentiate between CG1 and CG2 in valine?

Have you had a look at BALL yet? Maybe it doesn't do exactly what you are looking for, but I really like it and have used it in multiple projects so far. Yes, documentation is a bit scattered, but it has some examples e.g. here: https://github.com/BALL-Project/ball/wiki/RMSD

I don't think I can connect the two libraries, but I could borrow some ideas, such as having a class similar to AtomBijection in BALL.

@CV-GPhL
Copy link
Contributor

CV-GPhL commented Jan 25, 2021 via email

@wojdyr
Copy link
Member

wojdyr commented Jan 25, 2021

Do you know an algorithm that can find all rotations for any small molecule?

@CV-GPhL
Copy link
Contributor

CV-GPhL commented Jan 25, 2021 via email

@wojdyr
Copy link
Member

wojdyr commented Jan 25, 2021

do you mean all internal (torsion) rotations? Wouldn't you need a dictionary for that?

I meant how to use rotations instead of atom swapping in general case?

@chmnk
Copy link

chmnk commented Oct 25, 2021

* for **subgraph isomorphism**: codes such as vf2/[vf3](https://github.com/MiviaLab/vf3lib) and [RI](https://github.com/InfOmics/RI)

Hi Conor and Marcin,
since the issue is still open, would like to add that symmetry-adapted RMSD computation of small molecules is nicely done in RDKit (CalcRMS and GetBestRMS with alignment) based on the VF2 algorithm.
Some benchmarking was also done a while ago (Systematic benchmark of substructure search in molecular graphs - From Ullmann to VF2) (maybe there are more recent papers on it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants