Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identifiers not being parsed through evaluator #553

Closed
barmoral opened this issue Apr 17, 2024 · 4 comments
Closed

Identifiers not being parsed through evaluator #553

barmoral opened this issue Apr 17, 2024 · 4 comments

Comments

@barmoral
Copy link

barmoral commented Apr 17, 2024

Describe the bug
I'm trying to use evaluator to filter papers with Osmotic Coefficient values from ThermoML. I've succesfully created the property type, filtered out dois with osmotic coefficients, converted them to a pandas dataframe, and printed the dataframe into a csv file. However, evaluator is not recognizing or reading all of the substances involved from the papers. It only recognizes one component, even if the thermoml .xml data does report other identifiers (StandardInChI, CommonName).

To Reproduce

Register Custom ThermoML Property:

@thermoml_property("Osmotic coefficient", supported_phases=PropertyPhase.Liquid | PropertyPhase.Gas)
class OsmoticCoefficient(PhysicalProperty):
    """A class representation of a osmotic coeff property"""
    @classmethod
    def default_unit(cls):
        return unit.dimensionless
setattr(properties, OsmoticCoefficient.__name__, OsmoticCoefficient)

Load ThermoML Data Set:
ds = ThermoMLDataSet.from_doi('10.1016/j.fluid.2006.09.025')

Write to csv:

ds_osm=ds.to_pandas()
ds_osm.to_csv("filt_ds_osmcoeff.csv")

Check involved compounds:
ds.substances

If the problem involves a specific molecule or file, please upload that as well. -->
filt_ds_osmcoeff.csv

Output
command "ds.substances" outputs "{<Substance O{solv}{x=1.000000}>}"
Here is link to the ThermoML report of this specific example paper proving there are more: https://trc.nist.gov/ThermoML/10.1016/j.fluid.2006.09.025.html

Computing environment (please complete the following information):

  • Operating system: Ubuntu 20.04.6 LTS 64-bit
  • Output of running conda list:
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
anyio                     4.2.0              pyhd8ed1ab_0    conda-forge
argon2-cffi               23.1.0             pyhd8ed1ab_0    conda-forge
argon2-cffi-bindings      21.2.0          py310h2372a71_4    conda-forge
arrow                     1.3.0              pyhd8ed1ab_0    conda-forge
asttokens                 2.4.1              pyhd8ed1ab_0    conda-forge
astunparse                1.6.3              pyhd8ed1ab_0    conda-forge
async-lru                 2.0.4              pyhd8ed1ab_0    conda-forge
attrs                     23.2.0             pyh71513ae_0    conda-forge
aws-c-auth                0.7.11               h0100c56_0    conda-forge
aws-c-cal                 0.6.9                h5d48c4d_2    conda-forge
aws-c-common              0.9.10               hd590300_0    conda-forge
aws-c-compression         0.2.17               h7f92143_7    conda-forge
aws-c-event-stream        0.4.1                h0bcb0bb_1    conda-forge
aws-c-http                0.8.0                hd268abd_1    conda-forge
aws-c-io                  0.13.36              hb3b01f7_3    conda-forge
aws-c-mqtt                0.10.0               hf5d392a_2    conda-forge
aws-c-s3                  0.4.7                hf8c57b3_3    conda-forge
aws-c-sdkutils            0.1.13               h7f92143_0    conda-forge
aws-checksums             0.1.17               h7f92143_6    conda-forge
aws-crt-cpp               0.26.0               h600aa22_5    conda-forge
aws-sdk-cpp               1.11.210             h405b101_9    conda-forge
babel                     2.14.0             pyhd8ed1ab_0    conda-forge
beautifulsoup4            4.12.3             pyha770c72_0    conda-forge
bleach                    6.1.0              pyhd8ed1ab_0    conda-forge
blosc                     1.21.5               h0f2a231_0    conda-forge
bokeh                     3.3.3              pyhd8ed1ab_0    conda-forge
boltons                   23.1.1             pyhd8ed1ab_0    conda-forge
brotli                    1.1.0                hd590300_1    conda-forge
brotli-bin                1.1.0                hd590300_1    conda-forge
brotli-python             1.1.0           py310hc6cd4ac_1    conda-forge
bson                      0.5.9                      py_0    conda-forge
bzip2                     1.0.8                hd590300_5    conda-forge
c-ares                    1.25.0               hd590300_0    conda-forge
ca-certificates           2024.2.2             hbcca054_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
cachetools                5.3.2              pyhd8ed1ab_0    conda-forge
cairo                     1.18.0               h3faef2a_0    conda-forge
certifi                   2024.2.2           pyhd8ed1ab_0    conda-forge
cffi                      1.16.0          py310h2fee648_0    conda-forge
cftime                    1.6.3           py310h1f7b6fc_0    conda-forge
charset-normalizer        3.3.2              pyhd8ed1ab_0    conda-forge
click                     8.1.7           unix_pyh707e725_0    conda-forge
cloudpickle               3.0.0              pyhd8ed1ab_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
comm                      0.2.1              pyhd8ed1ab_0    conda-forge
contourpy                 1.2.0           py310hd41b1e2_0    conda-forge
cudatoolkit               11.8.0              h4ba93d1_12    conda-forge
curl                      8.5.0                hca28451_0    conda-forge
cycler                    0.12.1             pyhd8ed1ab_0    conda-forge
cytoolz                   0.12.2          py310h2372a71_1    conda-forge
dask                      2023.12.1          pyhd8ed1ab_0    conda-forge
dask-core                 2023.12.1          pyhd8ed1ab_0    conda-forge
dask-jobqueue             0.8.2              pyhd8ed1ab_0    conda-forge
debugpy                   1.8.0           py310hc6cd4ac_1    conda-forge
decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
distributed               2023.12.1          pyhd8ed1ab_0    conda-forge
ele                       0.2.0              pyhd8ed1ab_0    conda-forge
entrypoints               0.4                pyhd8ed1ab_0    conda-forge
exceptiongroup            1.2.0              pyhd8ed1ab_0    conda-forge
executing                 2.0.1              pyhd8ed1ab_0    conda-forge
expat                     2.5.0                hcb278e6_1    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 h77eed37_1    conda-forge
fontconfig                2.14.2               h14ed4e7_0    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
fonttools                 4.47.0          py310h2372a71_0    conda-forge
forcefield-utilities      0.2.2              pyhd8ed1ab_0    conda-forge
foyer                     0.12.0             pyhd8ed1ab_0    conda-forge
fqdn                      1.5.1              pyhd8ed1ab_0    conda-forge
freetype                  2.12.1               h267a509_2    conda-forge
fsspec                    2023.12.2          pyhca7485f_0    conda-forge
future                    0.18.3             pyhd8ed1ab_0    conda-forge
gettext                   0.21.1               h27087fc_0    conda-forge
gf2x                      1.3.0                ha476b99_2    conda-forge
gflags                    2.2.2             he1b5a44_1004    conda-forge
glog                      0.6.0                h6f12383_0    conda-forge
gmp                       6.3.0                h59595ed_0    conda-forge
gmpy2                     2.1.2           py310h3ec546c_1    conda-forge
gmso                      0.11.2             pyhd8ed1ab_0    conda-forge
greenlet                  3.0.3           py310hc6cd4ac_0    conda-forge
hdf4                      4.2.15               h9772cbc_5    conda-forge
hdf5                      1.12.1          nompi_h4df4325_104    conda-forge
icu                       73.2                 h59595ed_0    conda-forge
idna                      3.6                pyhd8ed1ab_0    conda-forge
importlib-metadata        7.0.1              pyha770c72_0    conda-forge
importlib_metadata        7.0.1                hd8ed1ab_0    conda-forge
importlib_resources       6.1.1              pyhd8ed1ab_0    conda-forge
iniconfig                 2.0.0              pyhd8ed1ab_0    conda-forge
ipykernel                 6.28.0             pyhd33586a_0    conda-forge
ipython                   8.20.0             pyh707e725_0    conda-forge
ipywidgets                8.1.1              pyhd8ed1ab_0    conda-forge
isoduration               20.11.0            pyhd8ed1ab_0    conda-forge
jedi                      0.19.1             pyhd8ed1ab_0    conda-forge
jinja2                    3.1.2              pyhd8ed1ab_1    conda-forge
jpeg                      9e                   h0b41bf4_3    conda-forge
json5                     0.9.14             pyhd8ed1ab_0    conda-forge
jsonpointer               2.4             py310hff52083_3    conda-forge
jsonschema                4.21.1             pyhd8ed1ab_0    conda-forge
jsonschema-specifications 2023.12.1          pyhd8ed1ab_0    conda-forge
jsonschema-with-format-nongpl 4.21.1             pyhd8ed1ab_0    conda-forge
jupyter-lsp               2.2.2              pyhd8ed1ab_0    conda-forge
jupyter_client            7.4.9              pyhd8ed1ab_0    conda-forge
jupyter_core              5.7.1           py310hff52083_0    conda-forge
jupyter_events            0.9.0              pyhd8ed1ab_0    conda-forge
jupyter_server            2.12.5             pyhd8ed1ab_0    conda-forge
jupyter_server_terminals  0.5.2              pyhd8ed1ab_0    conda-forge
jupyterlab                4.0.11             pyhd8ed1ab_0    conda-forge
jupyterlab_pygments       0.3.0              pyhd8ed1ab_0    conda-forge
jupyterlab_server         2.25.2             pyhd8ed1ab_0    conda-forge
jupyterlab_widgets        3.0.9              pyhd8ed1ab_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.5           py310hd41b1e2_1    conda-forge
krb5                      1.21.2               h659d440_0    conda-forge
lark-parser               0.12.0             pyhd8ed1ab_0    conda-forge
lcms2                     2.12                 hddcbb42_0    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
lerc                      3.0                  h9c3ff4c_0    conda-forge
libabseil                 20230802.1      cxx17_h59595ed_0    conda-forge
libarrow                  14.0.2           h84dd17c_2_cpu    conda-forge
libarrow-acero            14.0.2           h59595ed_2_cpu    conda-forge
libarrow-dataset          14.0.2           h59595ed_2_cpu    conda-forge
libarrow-flight           14.0.2           h120cb0d_2_cpu    conda-forge
libarrow-flight-sql       14.0.2           h61ff412_2_cpu    conda-forge
libarrow-gandiva          14.0.2           hacb8726_2_cpu    conda-forge
libarrow-substrait        14.0.2           h61ff412_2_cpu    conda-forge
libblas                   3.9.0           20_linux64_openblas    conda-forge
libboost                  1.82.0               h6fcfa73_6    conda-forge
libboost-python           1.82.0          py310hcb52e73_6    conda-forge
libbrotlicommon           1.1.0                hd590300_1    conda-forge
libbrotlidec              1.1.0                hd590300_1    conda-forge
libbrotlienc              1.1.0                hd590300_1    conda-forge
libcblas                  3.9.0           20_linux64_openblas    conda-forge
libcrc32c                 1.1.2                h9c3ff4c_0    conda-forge
libcurl                   8.5.0                hca28451_0    conda-forge
libdeflate                1.10                 h7f98852_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libevent                  2.1.12               hf998b51_1    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libflint                  2.9.0           h2f819a4_ntl_100    conda-forge
libgcc-ng                 13.2.0               h807b86a_3    conda-forge
libgfortran-ng            13.2.0               h69a702a_3    conda-forge
libgfortran5              13.2.0               ha4646dd_3    conda-forge
libglib                   2.78.3               h783c2da_0    conda-forge
libgomp                   13.2.0               h807b86a_3    conda-forge
libgoogle-cloud           2.12.0               h5206363_4    conda-forge
libgrpc                   1.59.3               hd6c4280_0    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
liblapack                 3.9.0           20_linux64_openblas    conda-forge
libllvm14                 14.0.6               hcd5def8_4    conda-forge
libllvm15                 15.0.7               hb3ce162_4    conda-forge
libnetcdf                 4.8.1           nompi_h329d8a1_102    conda-forge
libnghttp2                1.58.0               h47da74e_1    conda-forge
libnl                     3.9.0                hd590300_0    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libnuma                   2.0.16               h0b41bf4_1    conda-forge
libopenblas               0.3.25          pthreads_h413a1c8_0    conda-forge
libparquet                14.0.2           h352af49_2_cpu    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
libprotobuf               4.24.4               hf27288f_0    conda-forge
libre2-11                 2023.06.02           h7a70373_0    conda-forge
libsodium                 1.0.18               h36c2ea0_1    conda-forge
libsqlite                 3.44.2               h2797004_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-ng              13.2.0               h7e041cc_3    conda-forge
libthrift                 0.19.0               hb90f79a_1    conda-forge
libtiff                   4.3.0                h0fcbabc_4    conda-forge
libutf8proc               2.8.0                h166bdaf_0    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libwebp-base              1.3.2                hd590300_0    conda-forge
libxcb                    1.15                 h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libxml2                   2.12.3               h232c23b_0    conda-forge
libxslt                   1.1.39               h76b75d6_0    conda-forge
libzip                    1.10.1               h2629f0a_3    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
llvmlite                  0.41.1          py310h1b8f574_0    conda-forge
locket                    1.0.0              pyhd8ed1ab_0    conda-forge
lxml                      5.1.0           py310hcfd0673_0    conda-forge
lz4                       4.3.3           py310h350c4a5_0    conda-forge
lz4-c                     1.9.4                hcb278e6_0    conda-forge
lzo                       2.10              h516909a_1000    conda-forge
markdown-it-py            3.0.0              pyhd8ed1ab_0    conda-forge
markupsafe                2.1.3           py310h2372a71_1    conda-forge
matplotlib-base           3.8.2           py310h62c0568_0    conda-forge
matplotlib-inline         0.1.6              pyhd8ed1ab_0    conda-forge
mdtraj                    1.9.9           py310h523e8d7_1    conda-forge
mdurl                     0.1.2              pyhd8ed1ab_0    conda-forge
mistune                   3.0.2              pyhd8ed1ab_0    conda-forge
mpc                       1.3.1                hfe3b2da_0    conda-forge
mpfr                      4.2.1                h9458935_0    conda-forge
mpiplus                   v0.0.2             pyhd8ed1ab_0    conda-forge
mpmath                    1.3.0              pyhd8ed1ab_0    conda-forge
msgpack-python            1.0.7           py310hd41b1e2_0    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
nbclient                  0.8.0              pyhd8ed1ab_0    conda-forge
nbconvert-core            7.14.2             pyhd8ed1ab_0    conda-forge
nbformat                  5.9.2              pyhd8ed1ab_0    conda-forge
ncurses                   6.4                  h59595ed_2    conda-forge
nest-asyncio              1.5.8              pyhd8ed1ab_0    conda-forge
netcdf4                   1.5.8           nompi_py310hd7ca5b8_101    conda-forge
networkx                  3.2.1              pyhd8ed1ab_0    conda-forge
nomkl                     1.0                  h5ca1d4c_0    conda-forge
nose                      1.3.7                   py_1006    conda-forge
notebook                  7.0.7              pyhd8ed1ab_0    conda-forge
notebook-shim             0.2.3              pyhd8ed1ab_0    conda-forge
ntl                       11.4.3               hef3c4d3_1    conda-forge
numba                     0.58.1          py310h7dc5dd1_0    conda-forge
numexpr                   2.8.8           py310hc2d3c2e_100    conda-forge
numpy                     1.26.3          py310hb13e2d6_0    conda-forge
ocl-icd                   2.3.1                h7f98852_0    conda-forge
ocl-icd-system            1.0.0                         1    conda-forge
olefile                   0.47               pyhd8ed1ab_0    conda-forge
openeye-toolkits          2023.2.3                py310_0    openeye
openff-amber-ff-ports     0.0.4              pyhca7485f_0    conda-forge
openff-evaluator          0.4.7              pyhd8ed1ab_0    conda-forge
openff-evaluator-base     0.4.7              pyhd8ed1ab_0    conda-forge
openff-forcefields        2023.11.0          pyhca7485f_0    conda-forge
openff-interchange-base   0.3.18             pyhd8ed1ab_0    conda-forge
openff-models             0.1.1              pyhca7485f_0    conda-forge
openff-toolkit-base       0.14.3             pyhd8ed1ab_0    conda-forge
openff-units              0.2.0              pyh1a96a4e_0    conda-forge
openff-utilities          0.1.12             pyhd8ed1ab_0    conda-forge
openjpeg                  2.5.0                h7d73246_0    conda-forge
openmm                    8.1.0           py310h52c1345_1    conda-forge
openmmtools               0.21.5             pyhd8ed1ab_1    conda-forge
openssl                   3.2.1                hd590300_1    conda-forge
orc                       1.9.2                h4b38347_0    conda-forge
overrides                 7.6.0              pyhd8ed1ab_0    conda-forge
packaging                 23.2               pyhd8ed1ab_0    conda-forge
packmol                   20.010               h86c2bf4_0    conda-forge
pandas                    1.5.3           py310h9b08913_1    conda-forge
pandocfilters             1.5.0              pyhd8ed1ab_0    conda-forge
parmed                    4.2.2           py310hc6cd4ac_1    conda-forge
parso                     0.8.3              pyhd8ed1ab_0    conda-forge
partd                     1.4.1              pyhd8ed1ab_0    conda-forge
pcre2                     10.42                hcad00b1_0    conda-forge
pdbfixer                  1.9                pyh1a96a4e_0    conda-forge
pexpect                   4.8.0              pyh1a96a4e_2    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pillow                    8.4.0           py310h07f4688_0    conda-forge
pint                      0.20.1             pyhd8ed1ab_0    conda-forge
pip                       23.3.2             pyhd8ed1ab_0    conda-forge
pixman                    0.43.0               h59595ed_0    conda-forge
pkgutil-resolve-name      1.3.10             pyhd8ed1ab_1    conda-forge
platformdirs              4.1.0              pyhd8ed1ab_0    conda-forge
pluggy                    1.4.0              pyhd8ed1ab_0    conda-forge
prometheus_client         0.19.0             pyhd8ed1ab_0    conda-forge
prompt-toolkit            3.0.42             pyha770c72_0    conda-forge
protobuf                  4.24.4          py310h620c231_0    conda-forge
psutil                    5.9.7           py310h2372a71_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
pyarrow                   14.0.2          py310hf9e7431_2_cpu    conda-forge
pyarrow-hotfix            0.6                pyhd8ed1ab_0    conda-forge
pycairo                   1.25.1          py310hda9f760_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pydantic                  1.10.13         py310h2372a71_1    conda-forge
pygments                  2.17.2             pyhd8ed1ab_0    conda-forge
pymbar                    3.1.1           py310hde88566_2    conda-forge
pyparsing                 3.1.1              pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
pytables                  3.7.0           py310hf5df6ce_0    conda-forge
pytest                    8.1.1              pyhd8ed1ab_0    conda-forge
python                    3.10.13         hd12c33a_1_cpython    conda-forge
python-constraint         1.4.0                      py_0    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-fastjsonschema     2.19.1             pyhd8ed1ab_0    conda-forge
python-json-logger        2.0.7              pyhd8ed1ab_0    conda-forge
python-symengine          0.11.0          py310h04af605_1    conda-forge
python_abi                3.10                    4_cp310    conda-forge
pytz                      2023.3.post1       pyhd8ed1ab_0    conda-forge
pyyaml                    6.0.1           py310h2372a71_1    conda-forge
pyzmq                     24.0.1          py310h330234f_1    conda-forge
rdkit                     2023.09.4       py310hb79e901_0    conda-forge
rdma-core                 49.0                 hd3aeb46_2    conda-forge
re2                       2023.06.02           h2873b5e_0    conda-forge
readline                  8.2                  h8228510_1    conda-forge
referencing               0.32.1             pyhd8ed1ab_0    conda-forge
reportlab                 3.5.68          py310h94fcab3_1    conda-forge
requests                  2.31.0             pyhd8ed1ab_0    conda-forge
rfc3339-validator         0.1.4              pyhd8ed1ab_0    conda-forge
rfc3986-validator         0.1.1              pyh9f0ad1d_0    conda-forge
rich                      13.7.0             pyhd8ed1ab_0    conda-forge
rpds-py                   0.17.1          py310hcb5633a_0    conda-forge
s2n                       1.4.1                h06160fa_0    conda-forge
scipy                     1.11.4          py310hb13e2d6_0    conda-forge
send2trash                1.8.2              pyh41d4057_0    conda-forge
setuptools                69.0.3             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
smirnoff99frosst          1.1.0              pyh44b312d_0    conda-forge
snappy                    1.1.10               h9fff704_0    conda-forge
sniffio                   1.3.0              pyhd8ed1ab_0    conda-forge
sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
soupsieve                 2.5                pyhd8ed1ab_1    conda-forge
sqlalchemy                2.0.25          py310h2372a71_0    conda-forge
stack_data                0.6.2              pyhd8ed1ab_0    conda-forge
symengine                 0.11.2               hb29318e_0    conda-forge
sympy                     1.12            pypyh9d50eac_103    conda-forge
tblib                     3.0.0              pyhd8ed1ab_0    conda-forge
terminado                 0.18.0             pyh0d859eb_0    conda-forge
tinycss2                  1.2.1              pyhd8ed1ab_0    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
toolz                     0.12.0             pyhd8ed1ab_0    conda-forge
tornado                   6.3.3           py310h2372a71_1    conda-forge
traitlets                 5.14.1             pyhd8ed1ab_0    conda-forge
types-python-dateutil     2.8.19.20240106    pyhd8ed1ab_0    conda-forge
typing-extensions         4.9.0                hd8ed1ab_0    conda-forge
typing_extensions         4.9.0              pyha770c72_0    conda-forge
typing_utils              0.1.0              pyhd8ed1ab_0    conda-forge
tzdata                    2023d                h0c530f3_0    conda-forge
ucx                       1.15.0               h75e419f_2    conda-forge
uncertainties             3.1.7              pyhd8ed1ab_0    conda-forge
unicodedata2              15.1.0          py310h2372a71_0    conda-forge
unyt                      2.9.2              pyhd8ed1ab_1    conda-forge
uri-template              1.3.0              pyhd8ed1ab_0    conda-forge
urllib3                   2.1.0              pyhd8ed1ab_0    conda-forge
wcwidth                   0.2.13             pyhd8ed1ab_0    conda-forge
webcolors                 1.13               pyhd8ed1ab_0    conda-forge
webencodings              0.5.1              pyhd8ed1ab_2    conda-forge
websocket-client          1.7.0              pyhd8ed1ab_0    conda-forge
wheel                     0.42.0             pyhd8ed1ab_0    conda-forge
widgetsnbextension        4.0.9              pyhd8ed1ab_0    conda-forge
xmltodict                 0.13.0             pyhd8ed1ab_0    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.1.1                hd590300_0    conda-forge
xorg-libsm                1.2.4                h7391055_0    conda-forge
xorg-libx11               1.8.7                h8ee46fc_0    conda-forge
xorg-libxau               1.0.11               hd590300_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h0b41bf4_2    conda-forge
xorg-libxrender           0.9.11               hd590300_0    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h0b41bf4_1003    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xyzservices               2023.10.1          pyhd8ed1ab_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
zeromq                    4.3.5                h59595ed_0    conda-forge
zict                      3.0.0              pyhd8ed1ab_0    conda-forge
zipp                      3.17.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               hd590300_5    conda-forge
zstd                      1.5.5                hfc55251_0    conda-forge

Additional context
I believe the problem is that the classmethod "from_xml_node" in the thermoml.py is not correctly identifying the xml identifiers so it cannot convert StandardInChI to smiles, for example.

@mattwthompson
Copy link
Member

I can reproduce this; there must be something different about this dataset that causes the parsing to fail in ways that the other supported properties do not. Or maybe it's not correctly being loaded as a plugin

@mattwthompson
Copy link
Member

Whatever's going wrong is surfacing from here:

properties = _PureOrMixtureData.from_xml_node(node, namespace, compounds)

It can't be that all identifiers are missed, otherwise it wouldn't think everything was pure water

Here's the script I'm using to test, based on what you shared:

from openff.units import unit

from openff.evaluator.datasets import PhysicalProperty, PropertyPhase
from openff.evaluator.datasets.thermoml import thermoml_property
from openff.evaluator.datasets.thermoml.thermoml import ThermoMLDataSet
from openff.evaluator.plugins import register_default_plugins, register_external_plugins


@thermoml_property(
    "Osmotic coefficient",
    supported_phases=PropertyPhase.Liquid | PropertyPhase.Gas,
)
class OsmoticCoefficient(PhysicalProperty):
    def default_unit(cls):
        return unit.dimensionless


register_default_plugins()
register_external_plugins()

ThermoMLDataSet._from_url(
    "https://trc.nist.gov/ThermoML/10.1016/j.fluid.2006.09.025.xml"
)

@lilyminium
Copy link
Contributor

lilyminium commented Sep 4, 2024

I traced this ultimately back to an incorrect calculation of MW, meaning that the non-O compound gets dropped at the lines below due to an apparent mole fraction around 1e-27. Raising #569 to fix.

for compound_index in compounds:
compound = compounds[compound_index]
if np.isclose(mole_fractions[compound_index], 0.0):
continue
substance.add_component(
component=Component(smiles=compound.smiles),
amount=MoleFraction(mole_fractions[compound_index]),
)

@mattwthompson
Copy link
Member

With the current development head (which would land in 0.4.10, most likely) including @lilyminium's recent fix, I think this is doing what one would expect? I blindly copied my code snippet from earlier

In [26]: df = ds.to_pandas()

In [27]: df.describe()
Out[27]:
       Temperature (K)  N Components  Mole Fraction 1  Mole Fraction 2  OsmoticCoefficient Value ()  OsmoticCoefficient Uncertainty ()
count           241.00         241.0       241.000000       241.000000                   241.000000                         241.000000
mean            298.15           2.0         0.011742         0.988258                     0.651477                           0.008793
std               0.00           0.0         0.010405         0.010405                     0.211759                           0.005274
min             298.15           2.0         0.000855         0.948725                     0.219100                           0.000550
25%             298.15           2.0         0.003139         0.982043                     0.530000                           0.004300
50%             298.15           2.0         0.008380         0.991620                     0.662500                           0.008450
75%             298.15           2.0         0.017957         0.996861                     0.833900                           0.011900
max             298.15           2.0         0.051275         0.999145                     0.977700                           0.019500

In [28]: df.head()
Out[28]:
                                 Id  Temperature (K) Pressure (kPa)         Phase  N Components  ... Mole Fraction 2 Exact Amount 2  OsmoticCoefficient Value () OsmoticCoefficient Uncertainty ()                       Source
0  c2e7b442254f4541b41b0869241d66b1           298.15           None  Liquid + Gas             2  ...        0.999140           None                       0.7389                           0.00655  10.1016/j.fluid.2006.09.025
1  befcc793e1054dd38b5df717d6603b95           298.15           None  Liquid + Gas             2  ...        0.998963           None                       0.7142                           0.00715  10.1016/j.fluid.2006.09.025
2  8768e8a84b6d4267b4f884d95fbece95           298.15           None  Liquid + Gas             2  ...        0.998622           None                       0.6730                           0.00820  10.1016/j.fluid.2006.09.025
3  e2acf2ede41b444ea445e66b5ebb5f83           298.15           None  Liquid + Gas             2  ...        0.998378           None                       0.6485                           0.00880  10.1016/j.fluid.2006.09.025
4  d8c3e030b0ff49baad2ebcb2c62444a6           298.15           None  Liquid + Gas             2  ...        0.998211           None                       0.6324                           0.00925  10.1016/j.fluid.2006.09.025

[5 rows x 16 columns]

In [29]: df['Component 1']
Out[29]:
0            CC[N+](C)(CC)CC.[I-]
1            CC[N+](C)(CC)CC.[I-]
2            CC[N+](C)(CC)CC.[I-]
3            CC[N+](C)(CC)CC.[I-]
4            CC[N+](C)(CC)CC.[I-]
                  ...
236    CCCCCCC[N+](CC)(CC)CC.[I-]
237    CCCCCCC[N+](CC)(CC)CC.[I-]
238    CCCCCCC[N+](CC)(CC)CC.[I-]
239    CCCCCCC[N+](CC)(CC)CC.[I-]
240    CCCCCCC[N+](CC)(CC)CC.[I-]
Name: Component 1, Length: 241, dtype: object

I haven't worked with this data, but I see

  • Non-trivial values for osmotic coefficients, in the range (0, 1) and uncertainty much smaller
  • Two chemical components in each data point
  • Mass fractions other than 0.0 and 1.0
  • Values looking pretty physical and SMILES strings looking like real molecules

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants