Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for custom translators #412

Open
bnaul opened this issue Aug 14, 2019 · 3 comments
Open

Support for custom translators #412

bnaul opened this issue Aug 14, 2019 · 3 comments

Comments

@bnaul
Copy link
Contributor

bnaul commented Aug 14, 2019

Along the lines of #215, it seems like there are quite a few parameter types that would be desirable to pass as inputs (most notably pd.DataFrames) that are simple enough to translate to/from JSON. Doing the transformation manually every time is a bit of a headache; is there any current method to register a custom translator that would handle this conversion automatically? Monkey-patching translate

@classmethod
def translate(cls, val):
"""Translate each of the standard json/yaml types to appropiate objects."""
if val is None:
return cls.translate_none(val)
elif isinstance(val, string_types):
return cls.translate_str(val)
# Needs to be before integer checks
elif isinstance(val, bool):
return cls.translate_bool(val)
elif isinstance(val, integer_types):
return cls.translate_int(val)
elif isinstance(val, float):
return cls.translate_float(val)
elif isinstance(val, dict):
return cls.translate_dict(val)
elif isinstance(val, list):
return cls.translate_list(val)
# Use this generic translation as a last resort
return cls.translate_escaped_str(val)
seems like the best option at the moment, but being able to explicitly specify translator at runtime seems a lot safer.

To be clear, this is not a proposal to automatically convert pandas objects (though I could certainly see an argument for that as well 😉 ), just to expose a method for users to add their own serialization methods. dask's method for allowing custom serializers is a nice, straightforward example of something similar: https://distributed.dask.org/en/latest/serialization.html#id3

@MSeal
Copy link
Member

MSeal commented Aug 15, 2019

You can register new translators easily enough. Notice at the bottom of that file we have a bunch of registration calls.

https://papermill.readthedocs.io/en/latest/extending-overview.html describes how one can register new engines and IO plugins. But I noticed we're missing

def register_entry_points(self):
"""Register entrypoints for an engine
Load handlers provided by other packages
"""
for entrypoint in entrypoints.get_group_all("papermill.engine"):
self.register(entrypoint.name, entrypoint.load())
for translators. If you wanted to make a PR for that equivalent code in translators you could then do:

from setuptools import setup, find_packages
setup(
    # all the normal setup.py arguments...
    entry_points={"papermill.translators": ["python=translators:PandasPythonTranslator"]},
)

in your project to register the translations.

@bnaul
Copy link
Contributor Author

bnaul commented Aug 15, 2019

@MSeal I started to take a stab at this but I hadn't realized that we also inject all of the parameter values into the notebook metadata: https://github.com/nteract/papermill/blob/master/papermill/parameterize.py#L104
So presently no matter what you do in the translator, any non-trivial parameter will fail with:

TypeError: Object of type DataFrame is not JSON serializable

Why exactly is there a need to store this extra copy of the parameters if they appear in the injected-parameters cell as well? I assume I'm just missing something about the execution flow that makes this necessary but unfortunately it seems like this inherently limits one to only simple JSON-compatible types.

@MSeal
Copy link
Member

MSeal commented Aug 15, 2019

It was intended to give programmatic access to what parameters were set. You can't read what the user input was from the cell itself if there was manipulations. To make this work, we'd likely want to have a placeholder object for non-json fields being added or catch the invalid json input and skip saving the parameters to the metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants