-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] pyarrow.compute.subtract_checked overflowing for some duration arrays constructed from numpy #35088
Comments
@lukemanley thanks for the report. This is an interesting bug .. The difference between both arrays that appear to be the same, is that the actual data buffer is different, because of being created differently (but the data is being masked because they are null, and so the actual value "behind" that null shouldn't matter in theory).
And so my assumption is that the overflow comes from actually subtracting the values in the second case ( However, the way that the "substract_checked" is implemented, should normally only do the actual substraction for data values that are not being masked as null, exactly to avoid situations like the above. But it seems there is a bug in this mechanism to skip values behind nulls. |
Thanks for the explanation. It looks like numpy uses that value (min int64) for NaT:
|
It's also tied to duration (e.g. you wouldn't get this behavior if you cast to int64). The fix is westonpace@ec9a5a4 although a proper PR should add tests as well as check the other checked functions (e.g. add_checked, etc.) It turns out that the "skip nulls" behavior is something that has to be specified per-kernel and it wasn't being specified for the duration kernels. Is this something we need to fit into 12.0.0? If so I can try and carve out some time later this week for a PR. |
Describe the bug, including details regarding any error messages, version, and platform.
In the example below,
arr2
andarr3
are duration arrays with a single null element.arr2
is constructed from a listarr3
is constructed from a numpy arrayOnce constructed, they evaluate to being equal.
However, they exhibit different behavior once passed to
pyarrow.compute.subtract_checked
:Component(s)
Python
The text was updated successfully, but these errors were encountered: