Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExtensionArray.__fillna__ should not use lib.is_scalar #20411

Closed
TomAugspurger opened this issue Mar 19, 2018 · 5 comments
Closed

ExtensionArray.__fillna__ should not use lib.is_scalar #20411

TomAugspurger opened this issue Mar 19, 2018 · 5 comments
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@TomAugspurger
Copy link
Contributor

lib.is_scalar is a bit too strict for the check we do in

if not is_scalar(value):
. For example, it doesn't condisder ipaddress.IPv4Address as a scalar.

In [1]: from pandas._libs import lib

In [2]: import ipaddress

In [3]: lib.is_scalar(ipaddress.IPv4Address(1))
Out[3]: False

Replacing that check with a hasattr(value, '__len__') should be sufficient.

@TomAugspurger TomAugspurger added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate ExtensionArray Extending pandas with custom dtypes or arrays. labels Mar 19, 2018
@TomAugspurger TomAugspurger added this to the 0.23.0 milestone Mar 19, 2018
@jorisvandenbossche
Copy link
Member

The same for geopandas.

However, in general, also the len check might not always be good, because eg for geopandas a MultiLineString has a length .. . Or for the json-array, a dict also has a length.

This might not be easy to solve in general, it's mainly due to our flexible API where we need to interpret a lot about what is passed in ..

@jorisvandenbossche
Copy link
Member

Actually, even simpler example where a length check will not work: strings.
(but of course, for only this one it would be easy to add to an additional check)

@TomAugspurger
Copy link
Contributor Author

Hmm hadn't thought that through.

Perhaps we require that ExtensionArray.fillna(array) requires array to be an instance of the ExtensionArray? That should remove all ambiguity, at the cost of requiring IPArray([]).fillna(['192.168.1.1']) to be written as IPArray([]).fillna(IPArray(['192.168.1.1'])).

@jorisvandenbossche
Copy link
Member

at the cost of requiring IPArray([]).fillna(['192.168.1.1']) to be written as IPArray([]).fillna(IPArray(['192.168.1.1'])).

But in this case you can still fill with IPArray([]).fillna('192.168.1.1') (filling with a scalar) ?

But such a requirement might be a good idea. Or we could also check it being "array-like" (check if is has a shape and if that shape matches).
In the current implementation, I don't think that a list (instead of array) would work anyhow, because we apply a boolean mask on it?

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Mar 19, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

No branches or pull requests

2 participants