Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Molecular Formula and Molecular Weight #16

Open
liquidcarbon opened this issue Apr 6, 2021 · 4 comments
Open

Molecular Formula and Molecular Weight #16

liquidcarbon opened this issue Apr 6, 2021 · 4 comments

Comments

@liquidcarbon
Copy link

Hello! I found your library very helpful in parsing SMILES.

Would you be interested in adding MF and MW as additional attributes?

Something along these lines:

from collections import default_dict
from pysmiles import read_smiles

AW = {
    'C': 12.0107,
    'H': 1.00794,
    # etc.
}

class MolecularFormula:
    def __init__(self, smiles: str):
        self.smiles = smiles
        self.mf = defaultdict(lambda: 0)

        try:
            mol = read_smiles(
                smiles,
                explicit_hydrogen=False,
                reinterpret_aromatic=False,
            )
            nodes = mol.nodes()

            for i in range(mol.number_of_nodes()):
                self.mf[nodes[i]['element']] += 1
                self.mf['H'] += nodes[i]['hcount']

            self.mw = 0
            for k, v in self.mf.items():
                self.mw += AW[k] * v

            self.mw = round(self.mw, 2)
        except Exception as e:
            # log or raise
            self.mw = 0

    def __repr__(self):
        return ''.join([str(k)+str(v) for k,v in self.mf.items()])
@pckroon
Copy link
Owner

pckroon commented Apr 6, 2021

Happy to hear you find the library helpful :)

I'm not quite sure whether the MolecularFormula is worth adding to the library. If anything I'd make a Molecule (subclass of nx.Graph) and give that a molecular_formula attribute/property. But I'm not sure it's worth the hassle/complication. Generating the MF should be pretty straightforward anyway (collections.Counter(nx.get_node_attributes(mol, 'element').values()), and sum(nx.get_node_attributes(mol, 'hcount').values()), will get you 90% of the way).

Adding a function that calculates a molecular_weight would not be too much work, and may be valuable to numerous people. However, I'd have to find a periodic table library that's easy to install somewhere. No point in maintaining that as well...

@liquidcarbon
Copy link
Author

liquidcarbon commented Apr 6, 2021

Thanks for the tip on nx.get_node_attributes! My implementation appears to be about 25% faster than through nx.
For periodic table you only need a dictionary of atomic weights (if you ignore isotopes, which I would). You get one like so:

ELEMENTS_URL = \
'https://github.com/raw/bokeh/bokeh/branch-2.4/bokeh/sampledata/_data/elements.csv'
df = pd.read_csv(ELEMENTS_URL)
df = df[~df['atomic mass'].str.contains('\[')]  # ignore radioactive elements
AW = df.set_index('symbol')['atomic mass'].astype(float).to_dict()

@pckroon
Copy link
Owner

pckroon commented Apr 7, 2021

My implementation appears to be about 25% faster than through nx.

I loop over the molecule twice, once for the mass, and once for the hcount, rather than getting both at the same time.

I find pulling data from a network connection rather impolite for a library though, so I'd much rather add a dependency on a lightweight periodic table module.

@liquidcarbon
Copy link
Author

Of course, I'm not suggesting to execute it every time someone imports. This is just a way to retrieve data. I hard-coded the dictionary into my module that does MW calculation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants