Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow io manager keys to be str enums not just strs #18778

Conversation

ShootingStarD
Copy link

@ShootingStarD ShootingStarD commented Dec 16, 2023

Summary & Motivation

More details in this issue : #18623
Previously, Dagster didn't accept to serialize enums from the StrEnum library.
I modified the _pack_value() method so that if it receives an Enum that is also of type str, we return the value of this enum.

With this change, users will be able to define keys for IOManager, Jobs, Assets etc into a single file separate in scope through the name of the enum, just like this example :

from enum import  auto
from strenum import SnakeCaseStrEnum

class IOManagerKeysSTR(SnakeCaseStrEnum): # SnakeCaseStrEnum inherits from StrEnum to convert the name into snake case
    IO_MANAGER_STR_ENUM = auto()

class JobKeysEnum(SnakeCaseStrEnum):
    SIMPLE_JOB = auto()
from dagster import AssetSelection, define_asset_job
from my_dagster_project.assets import upstream_asset, downstream_asset
from my_dagster_project.dagster_keys import JobKeysEnum

dummy_job = define_asset_job(
    name=JobKeysEnum.SIMPLE_JOB,
    selection=AssetSelection.assets(upstream_asset, downstream_asset),

)

How I Tested These Changes

I tested that it was possible to serialize and de-serialize an StrEnum and that the resulting de-serialized object was the str value of the enum (which is an ok behaviour for me because I will never read back the StrEnum in my use cases).
I also rerun all test from the test_serdes.py to check if it was okay
I also tested the test provided in the issue but I didn't added to the code because it was maybe a too big test

def test_str_enum_serialization_deserialisation():
    """
    Checks that (de)serializing an StrEnum from the strenum lib does not produces any error and returns the string value of the enum
    """
    class MyStrEnum(LowercaseStrEnum):
        ONE_STR_ENUM = auto()
    
    enum_val = MyStrEnum.ONE_STR_ENUM 
    serialized_value = serialize_value(val=enum_val) 
    assert serialized_value == f'"{enum_val.value}"'
    assert deserialize_value(serialized_value) == enum_val.value

To make the test I had to add strenum as extra-dependency for the tests, is it ok?

from typing import AbstractSet, Any, Dict, List, Mapping, NamedTuple, Optional, Sequence

from strenum import LowercaseStrEnum
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of taking this extra dep, I think we can test the same thing by just doing [1]

strenum doesn't add too much past that inheritance structure

https://github.com/irgeek/StrEnum/blob/master/strenum/__init__.py#L21C15-L21C29

"""
Checks that (de)serializing an StrEnum from the strenum lib does not produces any error and returns the string value of the enum
"""
class MyStrEnum(LowercaseStrEnum):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[1]

    class MyStrEnum(str, Enum):

Comment on lines +832 to +833
if isinstance(val, str): # serialize StrEnums in their string value
return val.value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smackesey thoughts on this fallback behavior? The trade-off here is some risk in having in our code a str Enum accidentally does not get whitelisted deserializing as a normal str for the upside of allowing users to use str Enums in apis that accept str.

Copy link
Member

@alangenfeld alangenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry this went stale

back to you for comments and the failures in the buildkite run here https://buildkite.com/dagster/unit-tests/builds/277

@smackesey
Copy link
Collaborator

Hi @ShootingStarD, thanks for writing this up with such a thorough explanation.

Unfortunately I'm concerned about risk here in modifying our serialization machinery. We have a current invariant where enums have to be whitelisted to be serialized, and we don't accept user enums. It's hard to assess the likely consequences of breaking that invariant, especially given the breaking change/confusion around StrEnum in Python 3.11: python/cpython#100458

Also, IIUC, you can achieve your stated goal by calling .value on your enum-- that will let you keep your names defined in enums, though I recognize it's a bit annoying:

from dagster import AssetSelection, define_asset_job
from my_dagster_project.assets import upstream_asset, downstream_asset
from my_dagster_project.dagster_keys import JobKeysEnum

dummy_job = define_asset_job(
    name=JobKeysEnum.SIMPLE_JOB.value,  # just call .value to pass a string
    selection=AssetSelection.assets(upstream_asset, downstream_asset),
)

In any case, thanks for the submission and I understand what you were hoping for but I'm going to close this because I don't think risk/reward here exceeds the threshold.

@smackesey smackesey closed this Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants