Skip to content

Latest commit

 

History

History
183 lines (125 loc) · 6.84 KB

atomtype.i.md

File metadata and controls

183 lines (125 loc) · 6.84 KB
Atom types

Graph theory is nice, but we are, of course, interested in chemistry. While graph theory has its limitations, we can do a lot of interesting things with just the vertex-edge formalism. Particularly, if we combine it with the concept of atom types.

An atom type is a concept to describe certain properties of the atom. For example, force fields use atom types to describe geometrical and interaction properties of the atoms in a molecule. Within such formalism, a sp3 carbon is a carbon with four neighbors organized in a tetrahedral coordination, as depicted in Figure methane.

![](images/methane.png)
The CDK atom type model

A complete description for the atom types of the following atomic properties is needed by the various algorithms in the CDK:

  • element
  • formal charge
  • number of bonded neighbors
  • hybridization (sp3, sp2, sp, etc)
  • number of lone pairs
  • number of π bonds

For example, the carbon in methane, we can list these properties with this code:

CDKAtomTypeProperties

We will see the carbon has these properties:

CDKAtomTypeProperties

For a carbon in benzene (C.sp2), it would list:

Csp2CarbonProperties

And for the oxygen in hydroxide (C.minus), it would give:

HydroxideOxygenProperties

A full list of CDK atom types is given in a table in Appendix atomtypeapp.

Hybridization Types

The CDK knows about various hybridization types. Hybridizations are linear combinations of atomic orbitals and typically used to explain the orientation of atoms attached to the central atom. For example, Figure methane showed one possible hybridization, sp3.

The list of supported hybridization types can be listed with:

HybridizationTypes

listing these types:

HybridizationTypes

Atom type perception

Because so many cheminformatics algorithms depend on atom type information, determining the atom types of the atoms in a molecule is typically a very first step, after a molecule has been created. When the CDK is not able to recognize (perceive) the atom type, then this will most certainly mean that the output of cheminformatics algorithms in undefined. The following two sections will describe how atom types can be perceived. It will also be shown what happens when the atom type cannot be recognized.

Single atoms

Instead of perceiving atom types for all atoms in the molecule, one may also perceive the type of a single atom. The former is more efficient when types need to be perceived for all atoms, but when the molecule only partly changed, it can be worthwhile to only perceive atom types for only the affected atoms:

AtomTypePerception

This reports the perceived atom type for the carbon:

AtomTypePerception

Full molecules

Because atom type perception requires the notion of ring systems, with each atom type being perceived individually, using the above approach ring detection must be done each time the atom type is perceived for each atom. Theoretically, this information can be cached, but there currently is no suitable solution for this in the CDK. Therefore, perceiving atom types for all atoms in a molecule can be done more efficiently with the following code:

AtomTypePerceptionMolecule

Configuring the Atom

We saw earlier how the AtomTypeManipulator class was used to configure an atom with the configure(IAtom, IAtomType) method. This class also has a convenience method to perceive and configure all atoms in a molecule with one call:

No atom type perceived?!

What happens when the findMatchingAtomType method does not find a suitable atom type, is that it returns a generic 'X' atom type:

AtomTypeX

This code example shows that it does not recognize an atom with the element symbol "G":

AtomTypeX

There are several reasons why atom types cannot be perceived, including:

  • the input is wrong, and
  • the CDK is wrong.

The CDK library had knowledge about a lot of atoms types (see Appendix atomtypesapp), but there are still gaps. It might be that the CDK simply does not know about an atom type that is present in your input. This can particularly be expected when using elements other than the typical 'organic chemistry' elements like carbon, nitrogen, and oxygen. Sulfur and phosphorus are already tricky, and metals the library only knows about a few of them.

However, another reason why the method can return X, is that the input passed is incorrect. In fact, this is one primary application of the CDK atom type perception: to identify errors in the input. An example erroneous input is the below, uncharged NH4. If it is attempted to perceive an atom type for this nitrogen, then the findMatchingAtomType method will in fact return X, as intended:

UnchargedNitrogenPerception

This is visible from the output it gives:

UnchargedNitrogenPerception

Now, if we know the input indeed has errors like this, we can correct for them programmatically. It is important to realize that your algorithm to do that may make mistakes too, and it is adviced to make the detection of the known errors in the input as explicit and detailed as possible. That may slow down your code a bit, but will greatly reduce the chance of introducing error.

The following code is very general and may easily make mistakes. For each atom for which no atom type was perceived, it increases the charge of the atom and tries to perceive the atom type again. This will certainly address the aforementioned nitrogen problem:

CorrectedNitrogenPerception

After this programmatic correction, we now find the proper atom type for the nitrogen:

CorrectedNitrogenPerception

Sybyl atom types

The Sybyl atom type list is well-known for its application in then mol2 file format (see the Mol2Format class) and used in force fields [Q61463648]. Sybyl atom types can be perceived with the SybylAtomTypeMatcher class, which perceives CDK atom types and then translates this in to Sybyl atom types:

SybylAtomTypePerception

This will give you the Sybyl atom type for carbon in methane:

SybylAtomTypePerception

A full list of Sybyl atom types is given in a table in Appendix sybyltypes.

References