Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset.map #3459

Merged
merged 20 commits into from
Nov 9, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions doc/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -462,13 +462,13 @@ Datasets support most of the same methods found on data arrays:
abs(ds)

Datasets also support NumPy ufuncs (requires NumPy v1.13 or newer), or
alternatively you can use :py:meth:`~xarray.Dataset.apply` to apply a function
alternatively you can use :py:meth:`~xarray.Dataset.map` to map a function
to each variable in a dataset:

.. ipython:: python

np.sin(ds)
ds.apply(np.sin)
ds.map(np.sin)

Datasets also use looping over variables for *broadcasting* in binary
arithmetic. You can do arithmetic between any ``DataArray`` and a dataset:
Expand Down
15 changes: 8 additions & 7 deletions doc/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,11 @@ Let's create a simple example dataset:

.. ipython:: python

ds = xr.Dataset({'foo': (('x', 'y'), np.random.rand(4, 3))},
coords={'x': [10, 20, 30, 40],
'letters': ('x', list('abba'))})
arr = ds['foo']
ds = xr.Dataset(
{"foo": (("x", "y"), np.random.rand(4, 3))},
coords={"x": [10, 20, 30, 40], "letters": ("x", list("abba"))},
)
arr = ds["foo"]
ds

If we groupby the name of a variable or coordinate in a dataset (we can also
Expand Down Expand Up @@ -93,15 +94,15 @@ Apply
~~~~~

To apply a function to each group, you can use the flexible
:py:meth:`~xarray.DatasetGroupBy.apply` method. The resulting objects are automatically
:py:meth:`~xarray.DatasetGroupBy.map` method. The resulting objects are automatically
concatenated back together along the group axis:

.. ipython:: python

def standardize(x):
return (x - x.mean()) / x.std()

arr.groupby('letters').apply(standardize)
arr.groupby('letters').map(standardize)

GroupBy objects also have a :py:meth:`~xarray.DatasetGroupBy.reduce` method and
methods like :py:meth:`~xarray.DatasetGroupBy.mean` as shortcuts for applying an
Expand Down Expand Up @@ -202,7 +203,7 @@ __ http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#_two_dimen
dims=['ny','nx'])
da
da.groupby('lon').sum(...)
da.groupby('lon').apply(lambda x: x - x.mean(), shortcut=False)
da.groupby('lon').map(lambda x: x - x.mean(), shortcut=False)

Because multidimensional groups have the ability to generate a very large
number of bins, coarse-binning via :py:meth:`~xarray.Dataset.groupby_bins`
Expand Down
2 changes: 1 addition & 1 deletion doc/howdoi.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ How do I ...
* - convert a possibly irregularly sampled timeseries to a regularly sampled timeseries
- :py:meth:`DataArray.resample`, :py:meth:`Dataset.resample` (see :ref:`resampling` for more)
* - apply a function on all data variables in a Dataset
- :py:meth:`Dataset.apply`
- :py:meth:`Dataset.map`
* - write xarray objects with complex values to a netCDF file
- :py:func:`Dataset.to_netcdf`, :py:func:`DataArray.to_netcdf` specifying ``engine="h5netcdf", invalid_netcdf=True``
* - make xarray objects look like other xarray objects
Expand Down
2 changes: 1 addition & 1 deletion doc/quick-overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ xarray supports grouped operations using a very similar API to pandas (see :ref:
labels = xr.DataArray(['E', 'F', 'E'], [data.coords['y']], name='labels')
labels
data.groupby(labels).mean('y')
data.groupby(labels).apply(lambda x: x - x.min())
data.groupby(labels).map(lambda x: x - x.min())

Plotting
--------
Expand Down
7 changes: 7 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,13 @@ New Features
option for dropping either labels or variables, but using the more specific methods is encouraged.
(:pull:`3475`)
By `Maximilian Roos <https://github.com/max-sixty>`_
- :py:meth:`Dataset.map` & :py:meth:`GroupBy.map` & :py:meth:`Resample.map` have been added for
dcherian marked this conversation as resolved.
Show resolved Hide resolved
mapping / applying a function over each item in the collection, reflecting the widely used
and least surprising name for this operation.
The existing ``apply`` methods remain for backward compatibility, though using the ``map``
methods is encouraged.
(:pull:`3459`)
By `Maximilian Roos <https://github.com/max-sixty>`_
- :py:meth:`Dataset.transpose` and :py:meth:`DataArray.transpose` now support an ellipsis (`...`)
to represent all 'other' dimensions. For example, to move one dimension to the front,
use `.transpose('x', ...)`. (:pull:`3421`)
Expand Down
11 changes: 8 additions & 3 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -919,7 +919,7 @@ def copy(self, deep: bool = True, data: Any = None) -> "DataArray":
Coordinates:
* x (x) <U1 'a' 'b' 'c'

See also
See Also
--------
pandas.DataFrame.copy
"""
Expand Down Expand Up @@ -1716,7 +1716,7 @@ def stack(
codes=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]],
names=['x', 'y'])

See also
See Also
--------
DataArray.unstack
"""
Expand Down Expand Up @@ -1764,7 +1764,7 @@ def unstack(
>>> arr.identical(roundtripped)
True

See also
See Also
--------
DataArray.stack
"""
Expand Down Expand Up @@ -1922,6 +1922,11 @@ def drop(
"""Backward compatible method based on `drop_vars` and `drop_sel`

Using either `drop_vars` or `drop_sel` is encouraged

See Also
--------
DataArray.drop_vars
DataArray.drop_sel
"""
ds = self._to_temp_dataset().drop(labels, dim, errors=errors)
return self._from_temp_dataset(ds)
Expand Down
34 changes: 30 additions & 4 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -3557,6 +3557,11 @@ def drop(self, labels=None, dim=None, *, errors="raise", **labels_kwargs):
"""Backward compatible method based on `drop_vars` and `drop_sel`

Using either `drop_vars` or `drop_sel` is encouraged

See Also
--------
Dataset.drop_vars
Dataset.drop_sel
"""
if errors not in ["raise", "ignore"]:
raise ValueError('errors must be either "raise" or "ignore"')
Expand Down Expand Up @@ -4108,14 +4113,14 @@ def reduce(
variables, coord_names=coord_names, attrs=attrs, indexes=indexes
)

def apply(
def map(
self,
func: Callable,
keep_attrs: bool = None,
args: Iterable[Any] = (),
**kwargs: Any,
) -> "Dataset":
"""Apply a function over the data variables in this dataset.
"""Apply a function to each variable in this dataset

Parameters
----------
Expand All @@ -4135,7 +4140,7 @@ def apply(
Returns
-------
applied : Dataset
Resulting dataset from applying ``func`` over each data variable.
Resulting dataset from applying ``func`` to each data variable.

Examples
--------
Expand All @@ -4148,7 +4153,7 @@ def apply(
Data variables:
foo (dim_0, dim_1) float64 -0.3751 -1.951 -1.945 0.2948 0.711 -0.3948
bar (x) int64 -1 2
>>> ds.apply(np.fabs)
>>> ds.map(np.fabs)
<xarray.Dataset>
Dimensions: (dim_0: 2, dim_1: 3, x: 2)
Dimensions without coordinates: dim_0, dim_1, x
Expand All @@ -4165,6 +4170,27 @@ def apply(
attrs = self.attrs if keep_attrs else None
return type(self)(variables, attrs=attrs)

def apply(
self,
func: Callable,
keep_attrs: bool = None,
args: Iterable[Any] = (),
**kwargs: Any,
) -> "Dataset":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep the docstring here, even if it's just "Alias for Dataset.map"?

Ideally we would use the See also section, which gets turned into a link by numpydoc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added See Also for the drop cases

"""
Backward compatible implementation of ``map``

See Also
--------
Dataset.map
"""
warnings.warn(
"Dataset.apply may be deprecated in the future. Using Dataset.map is encouraged",
PendingDeprecationWarning,
stacklevel=2,
)
return self.map(func, keep_attrs, args, **kwargs)

def assign(
self, variables: Mapping[Hashable, Any] = None, **variables_kwargs: Hashable
) -> "Dataset":
Expand Down
49 changes: 40 additions & 9 deletions xarray/core/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -608,7 +608,7 @@ def assign_coords(self, coords=None, **coords_kwargs):
Dataset.swap_dims
"""
coords_kwargs = either_dict_or_kwargs(coords, coords_kwargs, "assign_coords")
return self.apply(lambda ds: ds.assign_coords(**coords_kwargs))
return self.map(lambda ds: ds.assign_coords(**coords_kwargs))


def _maybe_reorder(xarray_obj, dim, positions):
Expand Down Expand Up @@ -655,8 +655,8 @@ def lookup_order(dimension):
new_order = sorted(stacked.dims, key=lookup_order)
return stacked.transpose(*new_order, transpose_coords=self._restore_coord_dims)

def apply(self, func, shortcut=False, args=(), **kwargs):
"""Apply a function over each array in the group and concatenate them
def map(self, func, shortcut=False, args=(), **kwargs):
"""Apply a function to each array in the group and concatenate them
together into a new array.

`func` is called like `func(ar, *args, **kwargs)` for each array `ar`
Expand Down Expand Up @@ -702,6 +702,21 @@ def apply(self, func, shortcut=False, args=(), **kwargs):
applied = (maybe_wrap_array(arr, func(arr, *args, **kwargs)) for arr in grouped)
return self._combine(applied, shortcut=shortcut)

def apply(self, func, shortcut=False, args=(), **kwargs):
"""
Backward compatible implementation of ``map``

See Also
--------
DataArrayGroupBy.map
"""
warnings.warn(
"GroupBy.apply may be deprecated in the future. Using GroupBy.map is encouraged",
PendingDeprecationWarning,
stacklevel=2,
)
return self.map(func, shortcut=shortcut, args=args, **kwargs)

def _combine(self, applied, restore_coord_dims=False, shortcut=False):
"""Recombine the applied objects like the original."""
applied_example, applied = peek_at(applied)
Expand Down Expand Up @@ -765,7 +780,7 @@ def quantile(self, q, dim=None, interpolation="linear", keep_attrs=None):
if dim is None:
dim = self._group_dim

out = self.apply(
out = self.map(
self._obj.__class__.quantile,
shortcut=False,
q=q,
Expand Down Expand Up @@ -820,16 +835,16 @@ def reduce_array(ar):

check_reduce_dims(dim, self.dims)

return self.apply(reduce_array, shortcut=shortcut)
return self.map(reduce_array, shortcut=shortcut)


ops.inject_reduce_methods(DataArrayGroupBy)
ops.inject_binary_ops(DataArrayGroupBy)


class DatasetGroupBy(GroupBy, ImplementsDatasetReduce):
def apply(self, func, args=(), shortcut=None, **kwargs):
"""Apply a function over each Dataset in the group and concatenate them
def map(self, func, args=(), shortcut=None, **kwargs):
"""Apply a function to each Dataset in the group and concatenate them
together into a new Dataset.

`func` is called like `func(ds, *args, **kwargs)` for each dataset `ds`
Expand Down Expand Up @@ -862,6 +877,22 @@ def apply(self, func, args=(), shortcut=None, **kwargs):
applied = (func(ds, *args, **kwargs) for ds in self._iter_grouped())
return self._combine(applied)

def apply(self, func, args=(), shortcut=None, **kwargs):
"""
Backward compatible implementation of ``map``

See Also
--------
DatasetGroupBy.map
"""

warnings.warn(
"GroupBy.apply may be deprecated in the future. Using GroupBy.map is encouraged",
PendingDeprecationWarning,
stacklevel=2,
)
return self.map(func, shortcut=shortcut, args=args, **kwargs)

def _combine(self, applied):
"""Recombine the applied objects like the original."""
applied_example, applied = peek_at(applied)
Expand Down Expand Up @@ -914,7 +945,7 @@ def reduce_dataset(ds):

check_reduce_dims(dim, self.dims)

return self.apply(reduce_dataset)
return self.map(reduce_dataset)

def assign(self, **kwargs):
"""Assign data variables by group.
Expand All @@ -923,7 +954,7 @@ def assign(self, **kwargs):
--------
Dataset.assign
"""
return self.apply(lambda ds: ds.assign(**kwargs))
return self.map(lambda ds: ds.assign(**kwargs))


ops.inject_reduce_methods(DatasetGroupBy)
Expand Down
43 changes: 39 additions & 4 deletions xarray/core/resample.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import warnings

from . import ops
from .groupby import DataArrayGroupBy, DatasetGroupBy

Expand Down Expand Up @@ -173,8 +175,8 @@ def __init__(self, *args, dim=None, resample_dim=None, **kwargs):

super().__init__(*args, **kwargs)

def apply(self, func, shortcut=False, args=(), **kwargs):
"""Apply a function over each array in the group and concatenate them
def map(self, func, shortcut=False, args=(), **kwargs):
"""Apply a function to each array in the group and concatenate them
together into a new array.

`func` is called like `func(ar, *args, **kwargs)` for each array `ar`
Expand Down Expand Up @@ -212,7 +214,9 @@ def apply(self, func, shortcut=False, args=(), **kwargs):
applied : DataArray or DataArray
The result of splitting, applying and combining this array.
"""
combined = super().apply(func, shortcut=shortcut, args=args, **kwargs)
# TODO: the argument order for Resample doesn't match that for its parent,
# GroupBy
combined = super().map(func, shortcut=shortcut, args=args, **kwargs)

# If the aggregation function didn't drop the original resampling
# dimension, then we need to do so before we can rename the proxy
Expand All @@ -225,6 +229,21 @@ def apply(self, func, shortcut=False, args=(), **kwargs):

return combined

def apply(self, func, args=(), shortcut=None, **kwargs):
"""
Backward compatible implementation of ``map``

See Also
--------
DataArrayResample.map
"""
warnings.warn(
"Resample.apply may be deprecated in the future. Using Resample.map is encouraged",
PendingDeprecationWarning,
stacklevel=2,
)
return self.map(func=func, shortcut=shortcut, args=args, **kwargs)


ops.inject_reduce_methods(DataArrayResample)
ops.inject_binary_ops(DataArrayResample)
Expand All @@ -247,7 +266,7 @@ def __init__(self, *args, dim=None, resample_dim=None, **kwargs):

super().__init__(*args, **kwargs)

def apply(self, func, args=(), shortcut=None, **kwargs):
def map(self, func, args=(), shortcut=None, **kwargs):
"""Apply a function over each Dataset in the groups generated for
resampling and concatenate them together into a new Dataset.

Expand Down Expand Up @@ -282,6 +301,22 @@ def apply(self, func, args=(), shortcut=None, **kwargs):

return combined.rename({self._resample_dim: self._dim})

def apply(self, func, args=(), shortcut=None, **kwargs):
"""
Backward compatible implementation of ``map``

See Also
--------
DataSetResample.map
"""

warnings.warn(
"Resample.apply may be deprecated in the future. Using Resample.map is encouraged",
PendingDeprecationWarning,
stacklevel=2,
)
return self.map(func=func, shortcut=shortcut, args=args, **kwargs)

def reduce(self, func, dim=None, keep_attrs=None, **kwargs):
"""Reduce the items in this group by applying `func` along the
pre-defined resampling dimension.
Expand Down
Loading