Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: PY311 failures #50124

Closed
MarcoGorelli opened this issue Dec 8, 2022 · 15 comments
Closed

CI: PY311 failures #50124

MarcoGorelli opened this issue Dec 8, 2022 · 15 comments
Labels
CI Continuous Integration

Comments

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Dec 8, 2022

Both on builds from main and 1.5.x:

https://github.com/pandas-dev/pandas/actions/runs/3647663722/jobs/6160167738

2022-12-08T11:43:18.6188497Z FAILED pandas/tests/dtypes/test_missing.py::test_array_equivalent_series[abc] - AssertionError: Did not see expected warning of class 'FutureWarning'
2022-12-08T11:43:18.6189358Z FAILED pandas/tests/indexes/object/test_indexing.py::TestGetIndexerNonUnique::test_get_indexer_non_unique_np_nats[datetime640-datetime6410] - OverflowError: Integer overflow getting a common metadata divisor for NumPy datetime metadata [Y] and [ps].
2022-12-08T11:43:18.6190343Z FAILED pandas/tests/indexes/object/test_indexing.py::TestGetIndexerNonUnique::test_get_indexer_non_unique_np_nats[datetime640-datetime6411] - OverflowError: Integer overflow getting a common metadata divisor for NumPy datetime metadata [Y] and [fs].
2022-12-08T11:43:18.6191258Z FAILED pandas/tests/indexes/object/test_indexing.py::TestGetIndexerNonUnique::test_get_indexer_non_unique_np_nats[datetime640-datetime6412] - OverflowError: Integer overflow getting a common metadata divisor for NumPy datetime metadata [Y] and [as].
...
@MarcoGorelli MarcoGorelli added the CI Continuous Integration label Dec 8, 2022
@MarcoGorelli MarcoGorelli changed the title CI: PY311 failures in 1.5.x branch CI: PY311 failures Dec 8, 2022
@MarcoGorelli
Copy link
Member Author

Looks like it's a new numpy version:

            Package                  Version_x                   Version_y
0  ----------------  -------------------------  --------------------------
7             numpy  1.25.0.dev0+79.g01d64079b  1.25.0.dev0+165.g6f9237e91
8         packaging                       21.3                        22.0

@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Dec 8, 2022

The test introduced in #43870 is now failing

cc @jbrockmendel @alexhlim (no blame)

I don't really understand why the deprecation warning was filtered out in that test - wasn't it warning us that this was going to fail?

$ cat t.py
import numpy as np

nat_1 = np.datetime64('NaT', 'Y')
nat_2 = np.datetime64('NaT', 'ps')
{nat_1}.add(nat_2)

$ python t.py
t.py:5: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
  {nat_1}.add(nat_2)

EDIT: not sure the deprecation warning should've been raised in the first place - isn't it usually raised when arrays aren't of the same length, e.g.

In [1]: np.arange(10)==np.arange(5,14)
<ipython-input-1-689dbe3c0296>:1: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
  np.arange(10)==np.arange(5,14)
Out[1]: False

?
I'll report to numpy

@MarcoGorelli
Copy link
Member Author

The following is also causing issues now:

with warnings.catch_warnings():
# numpy may show a FutureWarning:
# elementwise comparison failed; returning scalar instead,
# but in the future will perform elementwise comparison
# before returning NotImplemented. We fall back to the correct
# behavior today, so that should be fine to ignore.
warnings.filterwarnings("ignore", "elementwise", FutureWarning)
with np.errstate(all="ignore"):
method = getattr(self._data, f"__{op.__name__}__")
result = method(other)

Test that hits this

pytest pandas/tests/frame/methods/test_compare.py::test_compare_ea_and_np_dtype

@jbrockmendel
Copy link
Member

numpy/numpy#13548 fixed a couple days ago probably related

@MarcoGorelli
Copy link
Member Author

With numpy 1.23.5:

In [1]: np.array([1, 3]) == np.array([4, pd.NA])
<ipython-input-1-b84677c04e2f>:1: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
  np.array([1, 3]) == np.array([4, pd.NA])
Out[1]: False

With the latest nightly:

In [1]: np.array([1, 3]) == np.array([4, pd.NA])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[1], line 1
----> 1 np.array([1, 3]) == np.array([4, pd.NA])

File ~/pandas-dev/pandas/_libs/missing.pyx:413, in pandas._libs.missing.NAType.__bool__()
    411 
    412     def __bool__(self):
--> 413         raise TypeError("boolean value of NA is ambiguous")
    414 
    415     def __hash__(self):

TypeError: boolean value of NA is ambiguous

@seberg
Copy link
Contributor

seberg commented Dec 8, 2022

The change is that NumPy is now (not in the upcoming 1.24 release!) raising errors that occur during comparisons.
It is also returning arrays of all False or all True for DTypes that cannot be compared, like np.array(["1"]) == 1.

I would not love to undo that step, these deprecation/futurewarnings were a long pain-point. The way specific cases are handled can definitely be tweaked of course!

@MarcoGorelli
Copy link
Member Author

Thanks @seberg for stepping in!

Not really sure what to do here - @jbrockmendel @mroeschke thoughts on how to "save" test_get_indexer_non_unique_np_nats? Are we OK to skip it for the numpy nightly for now to get CI green again (as in #50129)?

@MarcoGorelli
Copy link
Member Author

Maybe it's OK to just let numpy error for this comparison? It only appears in a single test - and even then, the original issue which motivated it had a comparison between datetime64('NaT') and timedelta64('NaT') (rather than between datetime64('NaT' 'Y') and datetime64('NaT' 'ps'))

@jbrockmendel
Copy link
Member

im not firing on all cylinders today, will take a closer look in the AM

@mroeschke
Copy link
Member

From a quick glance, I would think that a proper fix would involve Index.get_indexer_non_unique considering np.datetime64('NaT') with different units as still NA-like, so I would imagine the implementation needs to be adjusted (the set membership check is unfortunately no longer sufficient to check different np.datetime64('NaT')s with different units)

@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Dec 9, 2022

What should

idx = Index([1, 2, np.datetime64('NaT', 'Y')])
target = Index([np.datetime64('NaT', 'ps')], dtype=object)
idx.get_indexer_non_unique(target)

return if np.datetime64('NaT', 'Y') and np.datetime64('NaT', 'ps') can't be compared?

As far as I can tell, we're doomed from the start trying to support this

I'd just let numpy error to be honest, if someone has uncomparable types in an array they're going to run into other issues anyway

@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Dec 9, 2022

Summary

As the conversation's a bit scattered between here and the PR, I'll try to summarise

There's 3 issues:

  1. np.array([1, 3]) == 'abc' no longer warns, and just returns array([False, False])

  2. np.array([1, 3]) == np.array([4, pd.NA]) now raises. I don't think there's much we can do about this, because NA.__eq__(3) returns NA, and then if numpy calls bool on the result, then we will reach the TypeError here:

    def __bool__(self):
    raise TypeError("boolean value of NA is ambiguous")

  3. np.datetime64('NaT', 'Y') and np.datetime64('NaT', 'ps') can no longer be compared - trying to do so will raise

What to do about this? Here are my thoughts

  1. this fine, and non-controversial, so...all good 🥳
  2. just let it raise. pd.NA shoudn't really end up in a numpy array anyway, it's only meant to be used in pandas nullable types. So I think this is OK, but it does mean it's now more urgent to fix API: should to_numpy() by default return the corresponding type, and raise otherwise? #48891
  3. if these objects can't be compared, then there's no hope of distinguishing them in get_indexer_non_unique anyway, so I think it'd be fine to just let numpy raise

@MarcoGorelli
Copy link
Member Author

@jorisvandenbossche @jreback sorry for the tags, just hoping to give this a bit more visibility

@jorisvandenbossche
Copy link
Member

2. np.array([1, 3]) == np.array([4, pd.NA]) now raises. I don't think there's much we can do about this

Yes, numpy raising an error is the expected behaviour I would say (or at least a direct consequence of our choice to have comparisons with NA return NA, and to have the bool value of NA being ambiguous)

It's of course another case of object dtype arrays with NA not really being useful / working correctly (xref #32931)

@jbrockmendel
Copy link
Member

w/r/t different-unit np.timedelta64("NaT")s raising:

  1. the set behavior should not raise, but this will have to be fixed in numpy.
  2. for get_indexer with object-dtype a similar problem shows up with non-nat timedelta64 objects when comparing against pd.Timedelta objects (i'll open an issue shortly-ish). The upshot of this is that i think we'll end up having to handle some of this in the khash code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration
Projects
None yet
Development

No branches or pull requests

6 participants