Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aggregation functions treat duck arrays differently depending on dtype #3241

Closed
keewis opened this issue Aug 22, 2019 · 6 comments
Closed

aggregation functions treat duck arrays differently depending on dtype #3241

keewis opened this issue Aug 22, 2019 · 6 comments

Comments

@keewis
Copy link
Collaborator

keewis commented Aug 22, 2019

While working on #3238, I tried replacing np.arange with np.linspace to create test arrays:

>>> ureg = pint.UnitRegistry()
>>> # with int values
>>> array = np.arange(10).astype(int) * ureg.m
>>> np.max(array)
<Quantity(9, 'meter')>
>>> np.max(xr.DataArray(data=array))  # works as expected
<xarray.DataArray ()>
<Quantity(9, 'meter')>
>>> # now with floats
>>> array = np.arange(10).astype(float) * ureg.m
>>> np.max(array)
<Quantity(9.0, 'meter')>
>>> np.max(xr.DataArray(data=array))  # unit information is lost
<xarray.DataArray ()>
array(9.)

Judging by the build logs of #3238, this seems to be the case for all aggregation functions except from np.median and of course those that return booleans or indices.

@keewis
Copy link
Collaborator Author

keewis commented Aug 23, 2019

This seams to be an issue with pint and is worked on in hgrecco/pint#764: using that PR instead of the version available in conda-forge makes all functions fail with a TypeError regardless of dtype.

So I guess this can be closed?

@keewis keewis closed this as completed Aug 23, 2019
@keewis keewis mentioned this issue Sep 2, 2019
16 tasks
@keewis
Copy link
Collaborator Author

keewis commented Sep 2, 2019

now that I hit this issue using the example from #3238 again, this seems to be a bug in xarray. For reference, this is the mentioned example that fails even with a pint version with __array_function__:

>>> xr.DataArray(data=np.arange(10).astype(int) * ureg.m).median()
<xarray.DataArray ()>
<Quantity(4.5, 'meter')>
>>> xr.DataArray(data=np.arange(10).astype(float) * ureg.m).median()
<xarray.DataArray ()>
array(4.5)

@keewis keewis reopened this Sep 2, 2019
@shoyer
Copy link
Member

shoyer commented Sep 2, 2019

@keewis could you clarify that example? Both those examples appear to be the same code, with different results!

@keewis
Copy link
Collaborator Author

keewis commented Sep 2, 2019

that's true. I edited it, the float version should fail. The new example is actually the same as the first one, but using DataArray.median() instead of np.max()

@shoyer
Copy link
Member

shoyer commented Sep 2, 2019

Can you try merging in the latest version of xarray master into your branch? I think this issue was fixed just recently by #3254. When I test this myself, both versions seem to do the right thing:

In [5]: xr.DataArray(data=np.arange(10).astype(float) * ureg.m).median()
Out[5]:
<xarray.DataArray ()>
<Quantity(4.5, 'meter')>

In [6]: xr.DataArray(data=np.arange(10).astype(int) * ureg.m).median()
Out[6]:
<xarray.DataArray ()>
<Quantity(4.5, 'meter')>

@keewis
Copy link
Collaborator Author

keewis commented Sep 2, 2019

you're right, after merging the issue is gone for me, too.

@keewis keewis closed this as completed Sep 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants