Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

136-feature: Allow import/export from JSON #333

Conversation

maarten-betman
Copy link
Contributor

Adding support for exporting and import data from json.

As discussed in #136, try to achieve this using Pydantic

This commit adds support for pydantic in CPTData by checking if pydantic is installed and using it if available. If pydantic is not installed, the dataclass module is used instead.
src/pygef/cpt.py Outdated
from datetime import date
from enum import Enum
from typing import Any, List

import polars as pl

# check if pydantic is installed
try:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tversteeg I tried to keep pydantic as an optional dependency because I wasn't sure if you were willing to accept pydantic as an additional dependency.

Pydantic offers a near drop-in replacement for the builtins dataclass decorator, so thought this might be an elegant solution. Upon running the tests there are so validation errors where some fields were None while this was not allowed according to the type annotations,

Still need a some work of course to convert the polars DataFrame to json, but wanted to get a quick check on how you feel about using pydantic in this manner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this solution, it looks elegent. Although I wonder if it needs to be optional, is there a convincing argument against using pydantic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for me, I use Pydantic all over the place so would recommend it :)
But I think it will introduce a small overhead due to data validation step introduced by Pydantic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that overhead also prevents some nasty bugs and user errors, so for me that's probably worth it. @RDWimmers @tlukkezen, thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, lets add pydantic as an additional dependency.

- added config class and custum json_encoder for polars DataFrames
- added `computed_field` decorator to `property` attributes
@maarten-betman
Copy link
Contributor Author

Waited for the release of pydantic v2, which is now implemented in Rust and is significantly faster than v1.x
I turned the dataclasses to Pydantic BaseModels, but am getting some validation errors since some fields appear to be optional and this is not reflected in the model's type annotations.

eg, below throws a validation error since date is not optional.

class CPTData(BaseModel):
    ....
    research_report_date: date
    ...


# shim.py:

def gef_cpt_to_cpt_data(gef_cpt: _GefCpt) -> CPTData:
    ....
    kwargs["research_report_date"] = None
    ....
    return CPTData(**kwargs)

Copy link
Member

@RDWimmers RDWimmers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to implement v2 of pydantic

@@ -91,25 +90,35 @@ def __post_init__(self):
# bypass FrozenInstanceError
object.__setattr__(self, "data", df)

class Config:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseConfig is deprecated. Use the pydantic.ConfigDict instead.

-> https://docs.pydantic.dev/latest/api/config/

@@ -167,25 +166,35 @@ def __post_init__(self):
# bypass FrozenInstanceError
object.__setattr__(self, "data", df)

class Config:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseConfig is deprecated. Use the pydantic.ConfigDict instead.

-> https://docs.pydantic.dev/latest/api/config/

@RDWimmers
Copy link
Member

Waited for the release of pydantic v2, which is now implemented in Rust and is significantly faster than v1.x I turned the dataclasses to Pydantic BaseModels, but am getting some validation errors since some fields appear to be optional and this is not reflected in the model's type annotations.

eg, below throws a validation error since date is not optional.

class CPTData(BaseModel):
    ....
    research_report_date: date
    ...


# shim.py:

def gef_cpt_to_cpt_data(gef_cpt: _GefCpt) -> CPTData:
    ....
    kwargs["research_report_date"] = None
    ....
    return CPTData(**kwargs)

We can set the validation errors to None, looking at the minimal available attributes of the GEF and XML definitions there are not mandatory.
-> https://publicwiki.deltares.nl/download/attachments/102204318/GEF-Bore.pdf?version=1&modificationDate=1409732017000&api=v2
-> https://publicwiki.deltares.nl/download/attachments/102204318/GEF-CPT.pdf?version=1&modificationDate=1409732008000&api=v2
-> https://docs.geostandaarden.nl/bro/def-im-CPT-20220901/
-> https://docs.geostandaarden.nl/bro/def-im-BHR-GT-20220901/

@RDWimmers RDWimmers closed this Aug 13, 2024
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants