Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R/S chirality #15

Open
pckroon opened this issue Nov 23, 2020 · 2 comments
Open

R/S chirality #15

pckroon opened this issue Nov 23, 2020 · 2 comments

Comments

@pckroon
Copy link
Owner

pckroon commented Nov 23, 2020

RIght, so in openSMILES chirality is very elaborate [1]. I suggest supporting only tetrahedral chiralities, i.e. @ and @@ specifications, and later on E/Z chirality. This issue will be just about the tetrahedral R/S chirality.

The default case is fairly easy to cover since pysmiles guarantees that the node indices in the final molecule are in the same order as the atom entries in the input SMILES. openSMILES says to look along the axis from the "first" atom to the chiral center, and see whether the other atoms go clockwise (@@) or counter clockwise (@).
The complication arises from rings: the order of the bonds around the central atom are what matters according to openSMILES. In this case the node indices in the resulting graph do not necessarily correspond to the chiral order. Consider the following cases, where x is the chiral center with 4 different substituents (a, b, c, d), and .. means "any number of different atoms in between".

a[x@](b)(c)d  # Default case 0
b1..a[x@]1(c)d  # case 1
a[x@]1(c)(d)..b1  # case 2
a[x@]21(d)..c1..b2  # case 3
[x@]12cd..a1..b2  # case 4

Finally, implicit hydrogens are considered the "first" atom, but we can shove this under the carpet for now.

This results in the following

  • chiral atoms which are not involved with ring bonds do not need special treatment, since the node indices in the resulting graph are enough (case 0).
  • chiral atoms which are involved with ring bonds may need special treatment (cases 2, 3 and 4), but we can only do this once all atoms bonded to the chiral atom have been parsed (and have been given a node index).
  • The order of bonds around an atom will always be: implicit hydrogens, preceding atoms, ring bonds, other atoms.

I suggest lumping cases 1-4 together for sake of simplicity. This means when parsing we'll need to keep track of all atoms which are involved with ring bonds so we can post-process them. In addition, we'll need to track in which order (chiral) atoms have ring bonds, and once these edges are added to the graph, which node indices they correspond to (because ring indices can (and should) be reused).

So, concretely:

  • When "opening" a ring bond (first occurrence of a ring index): find the parent atom, store the ring indices in a list (this is effectively the inverse of ring_nums [2]), as well as the number of edges it already has (needed to distinguish cases 3 and 4).
  • When "closing" a ring bond, translate the previously stored ring index to an atom index.
  • After parsing the entire molecule, for every atom involved in ring bonds, see if it's chiral, and if so, construct the order in which nodes were bonded to it. Once done, set the atom's stereo attribute correctly (see below).

An open question is still how to represent the chirality at the graph level. I suggest sticking to the opensmiles @ @@ idea, which the exception that the order of the node inidices of the neighbours is determining (i.e. the neighbour with the lowest node index is the first, the one with highest last). We can then provide helper functions to translate that to R/S (and the other way). I think this is the most natural way of exposing it:

  • If you want to invert the chirality, switch @ for @@ or vice-versa.
  • If you want to change a CH3 group for a halogen, this could change your chirality even when you don't want to change the spatial orientation. Following our definition, this substitution is as simple as changing the 'element' attribute. If we store R/S you'd also have to recalculate the desired chirality.
  • This requires a little bit of attention when adding/removing explicit hydrogens, but we can cover this in the corresponding helper functions.

Open question is also how strict we want to be here: is it an error to provide a chiral specifier on a non-tetrahedral atom? What about symmetric substituents? In other words, what if I provide a chiral specifier for a non-chiral atom?

Lastly, careful thought is required for the SMILES writer, but I think we can push that back a little bit.

@Eljee thoughts?

[1] http://opensmiles.org/opensmiles.html#chirality
[2] https://github.com/pckroon/pysmiles/blob/master/pysmiles/read_smiles.py#L128

@craldaz
Copy link

craldaz commented Sep 1, 2021

I am wondering if you have made any progress on this. The ability to write_smiles with chirality would be very useful.

Represing chirality is straightforward at the graph level as node and edge attributes. For example, as is done in openforcefield.

$ from openforcefield.topology import Molecule
$ mol = Molecule.from_smiles('C/C=C/[C@@H](C)F')
$ nxgraph = mol.to_networkx()
$ nx.get_node_attributes(nxgraph,'stereochemistry')
{0: None, 1: None, 2: None, 3: 'R', 4: None, 5: None, 6: None, 7: None, 8: None, 9: None, 10: None, 11: None, 12: None, 13: None, 14: None}
$ [nxgraph.get_edge_data(*edge) for edge in list(nxgraph.edges())]
[{'bond_order': 1, 'is_aromatic': False, 'stereochemistry': None}, {'bond_order': 1, 'is_aromatic': False, 'stereochemistry': None}, {'bond_order': 1, 'is_aromatic': False, 'stereochemistry': None}, {'bond_order': 1, 'is_aromatic': False, 'stereochemistry': None}, {'bond_order': 2, 'is_aromatic': False, 'stereochemistry': 'E'}, {'bond_order': 1, 'is_aromatic': False, 'stereochemistry': None}, {'bond_order': 1, 'is_aromatic': False, 'stereochemistry': None}, {'bond_order': 1, 'is_aromatic': False, 'stereochemistry': None}, {'bond_order': 1, 'is_aromatic': False, 'stereochemistry': None}, {'bond_order': 1, 'is_aromatic': False, 'stereochemistry': None}, {'bond_order': 1, 'is_aromatic': False, 'stereochemistry': None}, {'bond_order': 1, 'is_aromatic': False, 'stereochemistry': None}, {'bond_order': 1, 'is_aromatic': False, 'stereochemistry': None}, {'bond_order': 1, 'is_aromatic': False, 'stereochemistry': None}]

I have no idea how write_smiles works tho and how it could be modified to add chiral atoms.

Interestingly in openforcefield documentation they also desire a from_graph function.

https://open-forcefield-toolkit.readthedocs.io/en/topology/api/generated/openforcefield.topology.Molecule.html#openforcefield.topology.Molecule.to_networkx

@pckroon
Copy link
Owner Author

pckroon commented Sep 14, 2021

I haven't had time to look into this further unfortunately :(

Adding an attribute to the atoms/bonds is easy, but that's only part of the story. We also need to define the meaning/semantics of this attribute, and make sure we can read and write this correctly in all possible edge cases.

The writeup above tried to identify the pros/cons of using @/@@ vs R/S, as well as edge cases. I do need more input on whether these are complete, and on whether @/@@ or R/S would be more convenient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants