ENH: Identify numpy-like zero-dimensional arrays as non-iterable #35131

znicholls · 2020-07-05T22:53:25Z

Note: I wasn't sure whether to label this as an enhancement or bug. I've put enhancement as I don't think the original implementation was actually incorrect, rather just limited in scope. I'm happy to change to bug if that's more appropriate.

Is your feature request related to a problem?

I'm one of the developers of Pint-Pandas, which provides an extension type for storing numeric data with units (and, once mature, may solve #10349). The problem we're having is that we can't pass all the extension type tests. In particular, we fail many tests due to Panda's current implementation of is_list_like.

As @hgrecco shows here, pandas determines whether something is_list_like here (code copied below).

cdef inline bint c_is_list_like(object obj, bint allow_sets) except -1:
    return (
        isinstance(obj, abc.Iterable)
        # we do not count strings/unicode/bytes as list-like
        and not isinstance(obj, (str, bytes))
        # exclude zero-dimensional numpy arrays, effectively scalars
        and not (util.is_array(obj) and obj.ndim == 0)
        # exclude sets if allow_sets is False
        and not (allow_sets is False and isinstance(obj, abc.Set))
    )

At the moment, all Pint Quantity instances have an __iter__ method. It is designed to raise a TypeError if the quantity isn't actually iterable (code here).

The problem is that the __iter__ method is defined, regardless of whether the Quantity instance is actually iterable or not. This is a problem because it means that, in all cases, isinstance(quantity_instance, abc.Iterable) returns True (even if quantity_instance isn't actually iterable and would return a TypeError as soon as one tried to iterate with it).

A numpy array suffers from the same issue which is where the not (util.is_array(obj) and obj.ndim == 0) in Panda's is_list_like is important, because it excludes scalars. The problem is that util.is_array(obj) returns False if obj is anything other than np.ndarray. As a result, we can't just add an ndim attribute to Quantity because the obj.ndim == 0 check will never be evaluated anyway.

Describe the solution you'd like

Update is_list_like so that it correctly identifies implementations of scalars in numpy-like arrays (I think these are called 'duck numpy arrays'?). I've put a draft in #35127 and would be grateful for feedback on the proposed solution there. If there's a better solution, I'm also happy to implement that instead.

API breaking implications

is_list_like would now return False for scalars of numpy-like array implementations. If we used the implementation in #35127 then any object that has an ndim attribute and has obj.ndim == 0 would no longer be identified as list-like.

Describe alternatives you've considered

We've discussed how we could get the behaviour we want by changing Pint, rather than changing Pandas, but it seems to be impossible if we want to maintain the numpy-like implementation without separate scalar and array classes (see discussion here).

Additional context

>>> import numpy as np
>>> import pandas as pd
>>> import pint
>>> from collections import abc

>>> numpy_scalar = np.array(2)
>>> numpy_scalar
array(2)
# numpy 0d-array (i.e. scalar) quantities appear to be iterable even though they're not
>>> isinstance(numpy_scalar, abc.Iterable)
True
>>> numpy_scalar[0]
...
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed

# however, numpy 0d-array quantities are correctly handled by Pandas
>>> pd.core.dtypes.inference.is_list_like(numpy_scalar)
False

# compare this with Pint's numpy-like implementation
>>> ureg = pint.UnitRegistry()
>>> quantity_scalar = 3 * ureg.meter
>>> quantity_scalar
<Quantity(3, 'meter')>

# similar to a numpy array, scalar quantities appear to be iterable even though they're not
>>> isinstance(quantity_scalar, abc.Iterable)
True
>>> quantity_scalar[0]  
...
TypeError: Neither Quantity object nor its magnitude (3)supports indexing

# however, Pandas identifies Pint's implementation as being iterable which is what is causing
# our problem
>>> pd.core.dtypes.inference.is_list_like(quantity_scalar)
True  # we would like this to be False

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2020-07-06T07:02:43Z

@znicholls Thanks for the clearly described issue!

We are having a similar problem in GeoPandas where the scalars (eg MultiPolygon) can be potentially iterable as well / can have a length (although here they are actually iterable, so not raising an error, making it even harder to detect this). xref #26333, #27911

So in those referenced issues, it is the length-check that is problematic. However in the case of pint, it might be easier to solve with the suggest ndim check.
One question here, though, the ndim check will only work with pint scalars that wrap a numpy scalar, I think? (so not for 3 * ureg.meter, but only for np.int64(3) * ureg.meter or (np.array([3] * ureg.meter)[0])

znicholls · 2020-07-06T07:24:31Z

One question here, though, the ndim check will only work with pint scalars that wrap a numpy scalar, I think?

Correct. To make this work, I think we'd have to add an ndim property to Pint's Quantity class that returned sensible values for all types (so we'd need custom behaviour to make sure e.g. quantity.ndim returns 0 even if you're wrapping a 'normal scalar' rather than a numpy scalar).

…s non-iterable (#44626)

znicholls added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 5, 2020

This was referenced Jul 5, 2020

ENH: make is_list_like handle non iterable numpy-like arrays correctly #35127

Closed

Shall we create a QuantityArray class? hgrecco/pint#1128

Open

jorisvandenbossche removed the Needs Triage Issue that has not been reviewed by a pandas team member label Jul 6, 2020

gfyoung added the Internals Related to non-user accessible pandas implementation label Jul 9, 2020

jorisvandenbossche mentioned this issue Jul 10, 2020

ENH: ListDtype / ListArray #35176

Closed

znicholls mentioned this issue Jan 1, 2021

Fixing the tests hgrecco/pint-pandas#52

Closed

dcoukos mentioned this issue Aug 24, 2021

ENH: Change implementation of pandas.api.types.is_list_like to check for iterability in more pythonic way #43196

Closed

znicholls mentioned this issue Aug 31, 2021

ENH: unit of measurement / physical quantities #10349

Closed

znicholls mentioned this issue Nov 23, 2021

RTFM and could not make pint[Mt CO2] work as a unit openscm/openscm-units#32

Closed

burnpanck mentioned this issue Nov 26, 2021

ENH: [Draft] Fix issue #35131 Identify zero-dimensional duck arrays as non-iterable #44626

Merged

4 tasks

jreback added this to the 1.4 milestone Nov 26, 2021

jreback closed this as completed in #44626 Dec 20, 2021

jreback pushed a commit that referenced this issue Dec 20, 2021

ENH: [Draft] Fix issue #35131 Identify zero-dimensional duck arrays a…

d228a78

…s non-iterable (#44626)

andrewgsavage mentioned this issue Oct 3, 2022

ENH: Enable assert_almost_equal to test numpy-like zero-dimensional arrays #48919

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Identify numpy-like zero-dimensional arrays as non-iterable #35131

ENH: Identify numpy-like zero-dimensional arrays as non-iterable #35131

znicholls commented Jul 5, 2020

jorisvandenbossche commented Jul 6, 2020

znicholls commented Jul 6, 2020

ENH: Identify numpy-like zero-dimensional arrays as non-iterable #35131

ENH: Identify numpy-like zero-dimensional arrays as non-iterable #35131

Comments

znicholls commented Jul 5, 2020

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Describe alternatives you've considered

Additional context

jorisvandenbossche commented Jul 6, 2020

znicholls commented Jul 6, 2020