Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support __array_ufunc__ for xarray objects. #1962

Merged
merged 11 commits into from
Mar 12, 2018
Merged

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Mar 5, 2018

This means NumPy ufuncs are now supported directly on xarray.Dataset objects,
and opens the door to supporting computation on new data types, such as sparse
arrays or arrays with units.

  • Closes __array_ufunc__ for xarray #1617 (remove if there is no corresponding issue, which should only be the case for minor changes)
  • Tests added (for all bug fixes or enhancements)
  • Tests passed (for all non-documentation changes)
  • Fully documented, including whats-new.rst for all changes and api.rst for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)

This means NumPy ufuncs are now supported directly on xarray.Dataset objects,
and opens the door to supporting computation on new data types, such as sparse
arrays or arrays with units.

Fixes GH1617
dask='allowed')


class DataWithCoords(SupportsArithmetic, AttrAccessMixin):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W1641 Implementing eq without also implementing hash

@@ -235,7 +239,65 @@ def get_squeeze_dims(xarray_obj, dim, axis=None):
return dim


class BaseDataObject(AttrAccessMixin):
class SupportsArithmetic(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W1641 Implementing eq without also implementing hash

@@ -235,7 +239,65 @@ def get_squeeze_dims(xarray_obj, dim, axis=None):
return dim


class BaseDataObject(AttrAccessMixin):
class SupportsArithmetic(object): # noqa: W1641
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W1641 Implementing eq without also implementing hash

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't # noqa: W1641 disable this warning?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's a pylint warning, not a flake8 warning. It's because py3k is enabled. We could turn that checker off

https://pylint.readthedocs.io/en/latest/user_guide/message-control.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's what I just did.

Copy link
Member

@fujiisoup fujiisoup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks nice.
I could not find any edge cases that break backward compatibility.

@shoyer
Copy link
Member Author

shoyer commented Mar 8, 2018

One potential edge case is if someone directly calls a ufunc reduce method, e.g., np.add.reduce. Previously, this would cast the xarray object to a numpy array, but now it will raise an error.

Example:

# current xarray
In [3]: np.add.reduce(xr.DataArray([1]))
Out[3]: 1

# with this pull request
In [3]: np.add.reduce(xr.DataArray(0))
NotImplementedError: reduce method for ufunc <ufunc 'add'> is not implemented on xarray objects, which currently only support the __call__ method.

Note that the more commonly used aliases for these reduce methods, e.g., np.sum() will continue to work since they check for sum() method on their argument.

There are also a few other ufunc methods that get used occasionally.

I think I'm OK breaking these because usage is so rare (and the work-around of casting to numpy arrays is so easy) but this should probably be noted in the release notes.

@dopplershift
Copy link
Contributor

This looks awesome. Thoughts on where you think units fits in here?

@shoyer
Copy link
Member Author

shoyer commented Mar 8, 2018

@dopplershift see #1938. Units should be able to make use of the same machinery as sparse arrays (__array_ufunc__ and multipledispatch), but xarray itself implementing __array_ufunc__ is not particularly useful, because you still can't put your own custom array types inside xarray objects.

@dopplershift
Copy link
Contributor

At this point I'd be happy to have hooks that let me intercept/wrap ufunc operations, though I guess that's what #1938 is supporting in a more systematic way.

@shoyer
Copy link
Member Author

shoyer commented Mar 9, 2018

If you try to put a pint array into xarray.DataArray right now, I'm pretty sure it will get cast into a NumPy array.

@dopplershift
Copy link
Contributor

Right. But such hooks would be sufficient to properly maintain the units attribute on a DataArray and check whether math made sense. This could use pint under the covers.

Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Excited to get this working.

import warnings

import numpy as np
import pandas as pd

from . import dtypes, formatting, ops
from .pycompat import OrderedDict, basestring, dask_array_type, suppress
from .arithmetic import SupportsArithmetic
from .options import OPTIONS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is OPTIONS ever used or is it needed in this scope for some reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops we don't need it here.

@shoyer shoyer mentioned this pull request Mar 9, 2018
3 tasks
@shoyer shoyer merged commit b430524 into pydata:master Mar 12, 2018
gerritholl added a commit to gerritholl/typhon that referenced this pull request Apr 5, 2018
Since commit c6ea042 (merged in atmtools#143), UnitsAwareDataArray depends on
xarray.DataArray.__array_ufunc__ (which in turn depends on numpy 1.13 or
newer).  This was merged into xarray in
pydata/xarray#1962 and added to release 0.10.2.
The old UnitsAwareDataArray does not work with xarray>=0.10.2 and the new
one does not work with xarray<=0.10.1.  Therefore, typhon now depends on
xarray>=0.10.2 and numpy>=1.13.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

__array_ufunc__ for xarray
6 participants