Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_smiles can create invalid SMILES when provided with chemically invalid graphs #17

Open
craldaz opened this issue May 27, 2021 · 2 comments

Comments

@craldaz
Copy link

craldaz commented May 27, 2021

For some reason my graph is returning SMILES for aromatic groups that uses aromatic bond symbols e.g. NC:1:N:N:C:[N]1N.

RDKit does not recognize these symbols and it removes all the aromaticity to produce NC1NNCN1N, and openbabel produces the same result.

Some have speculated that its a smarts string

https://mattermodeling.stackexchange.com/questions/4981/how-to-canonicalize-smiles-written-with-aromatic-bond-symbols

others just say it's wrong.

openbabel/openbabel#2368

Do you know what is going on?

Thanks for your help!

@pckroon
Copy link
Owner

pckroon commented May 28, 2021

It's complicated :)
See also section 3.5 from http://opensmiles.org/opensmiles.html, and let me highlight the following sentence:

The aromatic-bond symbol ':' can be used between aromatic atoms, but it is never necessary; a bond between two aromatic atoms is assumed to be aromatic unless it is explicitly represented as a single bond '-'. However, a single bond (nonaromatic bond) between two aromatic atoms must be explicitly represented.

How did you generate your graph? The write_smiles function does minimal chemical interpretation of your graph to avoid guessing wrong. All it does is remove explicit hydrogens (where able).
To mark aromatic regions in your molecule, represent them with lowercase element symbols (e.g. Nc1nncn1N). Pysmiles does provide a helper function for this (correct_aromatic_rings), but deciding what is or is not aromatic is a surprisingly nontrivial problem, in particular once extracyclic atoms need to be taken into account. There's a fairly detailed description on what this function does in the readme.

I hope this helps, or at least provides you with a workaround...

PS. thanks for the SE link, it's an interesting discussion.

edit: PPS: I agree with the assessment that pysmiles produced an invalid SMILES in this case. I'm debating whether I'll fix this, or whether it's better to leave write_smiles as a dumb serializer --- your graph is also chemically invalid, so I'm kind of ok with the resulting SMILES to also be chemically invalid. The roundtrip graph -> write_smiles -> read_smiles -> graph should always produce the exact same graph, whether it makes chemical sense or not.

@pckroon pckroon changed the title Aromatic bond symbols write_smiles can create invalid SMILES when provided with chemically invalid graphs Jun 8, 2021
pckroon added a commit that referenced this issue Jul 30, 2021
Reference issue #17 in readme to warn about invalid smiles
@fgrunewald
Copy link
Collaborator

@pckroon this should be fixed now. In fact there was a bug in the writer but I sneaky fixed that with the aromatic overhaul.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants