Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series.map() needs type for arg and result #941

Open
Dr-Irv opened this issue Jun 24, 2024 · 5 comments
Open

Series.map() needs type for arg and result #941

Dr-Irv opened this issue Jun 24, 2024 · 5 comments

Comments

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Jun 24, 2024

Specifically i am trying to typehint this function:

def map_hitgroup(series: pd.Series[int]) -> pd.Series[str]:
    hitgroup_mapping: dict[int, str] = {
        0: "generic",
        1: "head",
        2: "chest",
        3: "stomach",
        4: "left arm",
        5: "right arm",
        6: "left leg",
        7: "right leg",
        8: "neck",
        10: "gear",
    }
    return series.map(
        hitgroup_mapping
    )

There are 2 issues here. The type hint for series.map says that the return type (parameter) is the same as the initial one, which is not the case here (but it still works when running it).

The other is that the type hint does not specify the argument type.

Which results in the classic unknown warning.

Is there anything that i can currently do to adress this or is there just no way with the stubs and pyright strict mode.

Originally posted by @JanEricNitschke in #940

@Dr-Irv
Copy link
Collaborator Author

Dr-Irv commented Jun 24, 2024

Solution is to do something like this:

    def map(self, arg:dict[S1, S2], na_action: Literal["ignore"] | None = ...) -> Series[S2]: ...

where S2 is defined to be identical to S1. I tried the above with your example and it worked.

This happens with pyright defaults - no "strict" needed.

However, the arg should also cover the case of a general Mapping (rather than dict) and also a Callable mapping S1 to S2 and also a Series[S2]

PR with tests welcome

@JanEricNitschke
Copy link
Contributor

JanEricNitschke commented Jun 24, 2024

Will try to have a look if i find the time.

The pandas internal signature is Callable | Mapping | Series

I am also not 100% sure how to handle NaN values.

Nominally the function version would have to be something like

Callable[[S1 | NaN], S2 | NaN] except if na_action = "ignore. In that case it can be Callable[[S1], S2 | NaN]

Overall it should be:

from pandas._libs.missing import NAType

@overload
def map(self, arg: Callable[[S1], S2 | NAType] | Mapping[S1, S2] | Series[S2], na_action: Literal["ignore"] = ...) -> Series[S2]: ...

@overload
def map(self, arg: Callable[[S1 | NAType], S2 | NAType] | Mapping[S1, S2] | Series[S2], na_action: None = ...) -> Series[S2]: ...

@Dr-Irv
Copy link
Collaborator Author

Dr-Irv commented Jun 24, 2024

The pandas internal signature is Callable | Mapping | Series

The internal types are meant to check the pandas code, not user code. So they are wider than what we use in the stubs.

I am also not 100% sure how to handle NaN values.

Neither am I. You could try what you propose, or just leave NAType out of the declarations and that would be acceptable.

@Bruntaz
Copy link

Bruntaz commented Aug 14, 2024

The changes in #942 have caused a type error in our code - I'm not sure the types are quite correct.

Our code is along the lines of:

df = pd.read_sql_query("SELECT x, y, z FROM table", dtype={"x": "object"})
df["x"] = df["x"].map({1: True, 0: False, None: None})

The code does work in Pandas but the TypeVar in the Mapping argument doesn't accept None as a possible type.

@Dr-Irv
Copy link
Collaborator Author

Dr-Irv commented Aug 14, 2024

The code does work in Pandas but the TypeVar in the Mapping argument doesn't accept None as a possible type.

The issue here is that the update to map() assumed that you wanted a Series with a specified subtype, i.e., but in your case, you really need Series[Any] because you have None in there.

Solution that should work is to add this overload:

    @overload
    def map(
        self,
        arg: Callable[[Any], Any] | Mapping[Any, Any] | Series[Any],
        na_action: Literal["ignore"] | None = ...,
    ) -> Series[Any]: ...

And a test something like this:

df = pd.DataFrame({"x": [1, 0, None]})
check(assert_type( df["x"].map({1: True, 0: False, None: None}), pd.Series), pd.Series)

PR welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants