Merge remote-tracking branch 'upstream/master' into fix/allow_lazy

* upstream/master: add missing pint integration tests (pydata#3508) DOC: update bottleneck repo url (pydata#3507) add drop_sel, drop_vars, map to api.rst (pydata#3506) remove syntax warning (pydata#3505) Dataset.map, GroupBy.map, Resample.map (pydata#3459) tests for datasets with units (pydata#3447) fix pandas-dev tests (pydata#3491) unpin pseudonetcdf (pydata#3496) whatsnew corrections (pydata#3494)
dcherian · Nov 12, 2019 · 5a39f45 · 5a39f45
2 parents d8faed2 + 4e9240a
commit 5a39f45
Show file tree

Hide file tree

Showing 23 changed files with 2,056 additions and 100 deletions.
diff --git a/ci/azure/install.yml b/ci/azure/install.yml
@@ -16,7 +16,7 @@ steps:
         --pre \
         --upgrade \
         matplotlib \
-        pandas=0.26.0.dev0+628.g03c1a3db2 \  # FIXME https://github.com/pydata/xarray/issues/3440
+        pandas \
         scipy
         # numpy \  # FIXME https://github.com/pydata/xarray/issues/3409
     pip install \

diff --git a/ci/requirements/py36.yml b/ci/requirements/py36.yml
@@ -29,7 +29,7 @@ dependencies:
   - pandas
   - pint
   - pip
-  - pseudonetcdf<3.1  # FIXME https://github.com/pydata/xarray/issues/3409
+  - pseudonetcdf
   - pydap
   - pynio
   - pytest

diff --git a/ci/requirements/py37-windows.yml b/ci/requirements/py37-windows.yml
@@ -29,7 +29,7 @@ dependencies:
   - pandas
   - pint
   - pip
-  - pseudonetcdf<3.1  # FIXME https://github.com/pydata/xarray/issues/3409
+  - pseudonetcdf
   - pydap
   # - pynio  # Not available on Windows
   - pytest

diff --git a/ci/requirements/py37.yml b/ci/requirements/py37.yml
@@ -29,7 +29,7 @@ dependencies:
   - pandas
   - pint
   - pip
-  - pseudonetcdf<3.1  # FIXME https://github.com/pydata/xarray/issues/3409
+  - pseudonetcdf
   - pydap
   - pynio
   - pytest

diff --git a/doc/api.rst b/doc/api.rst
@@ -94,7 +94,7 @@ Dataset contents
    Dataset.rename_dims
    Dataset.swap_dims
    Dataset.expand_dims
-   Dataset.drop
+   Dataset.drop_vars
    Dataset.drop_dims
    Dataset.set_coords
    Dataset.reset_coords
@@ -118,6 +118,7 @@ Indexing
    Dataset.loc
    Dataset.isel
    Dataset.sel
+   Dataset.drop_sel
    Dataset.head
    Dataset.tail
    Dataset.thin
@@ -154,7 +155,7 @@ Computation
 .. autosummary::
    :toctree: generated/
 
-   Dataset.apply
+   Dataset.map
    Dataset.reduce
    Dataset.groupby
    Dataset.groupby_bins
@@ -263,7 +264,7 @@ DataArray contents
    DataArray.rename
    DataArray.swap_dims
    DataArray.expand_dims
-   DataArray.drop
+   DataArray.drop_vars
    DataArray.reset_coords
    DataArray.copy
 
@@ -283,6 +284,7 @@ Indexing
    DataArray.loc
    DataArray.isel
    DataArray.sel
+   DataArray.drop_sel
    DataArray.head
    DataArray.tail
    DataArray.thin
@@ -542,10 +544,10 @@ GroupBy objects
    :toctree: generated/
 
    core.groupby.DataArrayGroupBy
-   core.groupby.DataArrayGroupBy.apply
+   core.groupby.DataArrayGroupBy.map
    core.groupby.DataArrayGroupBy.reduce
    core.groupby.DatasetGroupBy
-   core.groupby.DatasetGroupBy.apply
+   core.groupby.DatasetGroupBy.map
    core.groupby.DatasetGroupBy.reduce
 
 Rolling objects
@@ -566,7 +568,7 @@ Resample objects
 ================
 
 Resample objects also implement the GroupBy interface
-(methods like ``apply()``, ``reduce()``, ``mean()``, ``sum()``, etc.).
+(methods like ``map()``, ``reduce()``, ``mean()``, ``sum()``, etc.).
 
 .. autosummary::
    :toctree: generated/

diff --git a/doc/computation.rst b/doc/computation.rst
@@ -183,7 +183,7 @@ a value when aggregating:
 
    Note that rolling window aggregations are faster and use less memory when bottleneck_ is installed. This only applies to numpy-backed xarray objects.
 
-.. _bottleneck: https://github.com/kwgoodman/bottleneck/
+.. _bottleneck: https://github.com/pydata/bottleneck/
 
 We can also manually iterate through ``Rolling`` objects:
 
@@ -462,13 +462,13 @@ Datasets support most of the same methods found on data arrays:
     abs(ds)
 
 Datasets also support NumPy ufuncs (requires NumPy v1.13 or newer), or
-alternatively you can use :py:meth:`~xarray.Dataset.apply` to apply a function
+alternatively you can use :py:meth:`~xarray.Dataset.map` to map a function
 to each variable in a dataset:
 
 .. ipython:: python
 
     np.sin(ds)
-    ds.apply(np.sin)
+    ds.map(np.sin)
 
 Datasets also use looping over variables for *broadcasting* in binary
 arithmetic. You can do arithmetic between any ``DataArray`` and a dataset:

diff --git a/doc/dask.rst b/doc/dask.rst
@@ -292,7 +292,7 @@ For the best performance when using Dask's multi-threaded scheduler, wrap a
 function that already releases the global interpreter lock, which fortunately
 already includes most NumPy and Scipy functions. Here we show an example
 using NumPy operations and a fast function from
-`bottleneck <https://github.com/kwgoodman/bottleneck>`__, which
+`bottleneck <https://github.com/pydata/bottleneck>`__, which
 we use to calculate `Spearman's rank-correlation coefficient <https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient>`__:
 
 .. code-block:: python

diff --git a/doc/groupby.rst b/doc/groupby.rst
@@ -35,10 +35,11 @@ Let's create a simple example dataset:
 
 .. ipython:: python
 
-    ds = xr.Dataset({'foo': (('x', 'y'), np.random.rand(4, 3))},
-                    coords={'x': [10, 20, 30, 40],
-                            'letters': ('x', list('abba'))})
-    arr = ds['foo']
+    ds = xr.Dataset(
+        {"foo": (("x", "y"), np.random.rand(4, 3))},
+        coords={"x": [10, 20, 30, 40], "letters": ("x", list("abba"))},
+    )
+    arr = ds["foo"]
     ds
 
 If we groupby the name of a variable or coordinate in a dataset (we can also
@@ -93,15 +94,15 @@ Apply
 ~~~~~
 
 To apply a function to each group, you can use the flexible
-:py:meth:`~xarray.DatasetGroupBy.apply` method. The resulting objects are automatically
+:py:meth:`~xarray.DatasetGroupBy.map` method. The resulting objects are automatically
 concatenated back together along the group axis:
 
 .. ipython:: python
 
     def standardize(x):
         return (x - x.mean()) / x.std()
 
-    arr.groupby('letters').apply(standardize)
+    arr.groupby('letters').map(standardize)
 
 GroupBy objects also have a :py:meth:`~xarray.DatasetGroupBy.reduce` method and
 methods like :py:meth:`~xarray.DatasetGroupBy.mean` as shortcuts for applying an
@@ -202,7 +203,7 @@ __ http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#_two_dimen
         dims=['ny','nx'])
     da
     da.groupby('lon').sum(...)
-    da.groupby('lon').apply(lambda x: x - x.mean(), shortcut=False)
+    da.groupby('lon').map(lambda x: x - x.mean(), shortcut=False)
 
 Because multidimensional groups have the ability to generate a very large
 number of bins, coarse-binning via :py:meth:`~xarray.Dataset.groupby_bins`

diff --git a/doc/howdoi.rst b/doc/howdoi.rst
@@ -44,7 +44,7 @@ How do I ...
    * - convert a possibly irregularly sampled timeseries to a regularly sampled timeseries
      - :py:meth:`DataArray.resample`, :py:meth:`Dataset.resample` (see :ref:`resampling` for more)
    * - apply a function on all data variables in a Dataset
-     - :py:meth:`Dataset.apply`
+     - :py:meth:`Dataset.map`
    * - write xarray objects with complex values to a netCDF file
      - :py:func:`Dataset.to_netcdf`, :py:func:`DataArray.to_netcdf` specifying ``engine="h5netcdf", invalid_netcdf=True``
    * - make xarray objects look like other xarray objects

diff --git a/doc/installing.rst b/doc/installing.rst
@@ -43,7 +43,7 @@ For accelerating xarray
 
 - `scipy <http://scipy.org/>`__: necessary to enable the interpolation features for
   xarray objects
-- `bottleneck <https://github.com/kwgoodman/bottleneck>`__: speeds up
+- `bottleneck <https://github.com/pydata/bottleneck>`__: speeds up
   NaN-skipping and rolling window aggregations by a large factor
 - `numbagg <https://github.com/shoyer/numbagg>`_: for exponential rolling
   window operations

diff --git a/doc/quick-overview.rst b/doc/quick-overview.rst
@@ -142,7 +142,7 @@ xarray supports grouped operations using a very similar API to pandas (see :ref:
     labels = xr.DataArray(['E', 'F', 'E'], [data.coords['y']], name='labels')
     labels
     data.groupby(labels).mean('y')
-    data.groupby(labels).apply(lambda x: x - x.min())
+    data.groupby(labels).map(lambda x: x - x.min())
 
 Plotting
 --------

diff --git a/doc/whats-new.rst b/doc/whats-new.rst
@@ -41,9 +41,16 @@ New Features
 - :py:meth:`Dataset.drop_sel` & :py:meth:`DataArray.drop_sel` have been added for dropping labels.
   :py:meth:`Dataset.drop_vars` & :py:meth:`DataArray.drop_vars` have been added for 
   dropping variables (including coordinates). The existing ``drop`` methods remain as a backward compatible 
-  option for dropping either lables or variables, but using the more specific methods is encouraged.
+  option for dropping either labels or variables, but using the more specific methods is encouraged.
   (:pull:`3475`)
   By `Maximilian Roos <https://github.com/max-sixty>`_
+- :py:meth:`Dataset.map` & :py:meth:`GroupBy.map` & :py:meth:`Resample.map` have been added for 
+  mapping / applying a function over each item in the collection, reflecting the widely used
+  and least surprising name for this operation.
+  The existing ``apply`` methods remain for backward compatibility, though using the ``map``
+  methods is encouraged.
+  (:pull:`3459`)
+  By `Maximilian Roos <https://github.com/max-sixty>`_
 - :py:meth:`Dataset.transpose` and :py:meth:`DataArray.transpose` now support an ellipsis (`...`)
   to represent all 'other' dimensions. For example, to move one dimension to the front,
   use `.transpose('x', ...)`. (:pull:`3421`)
@@ -107,7 +114,7 @@ Internal Changes
 ~~~~~~~~~~~~~~~~
 
 - Added integration tests against `pint <https://pint.readthedocs.io/>`_.
-  (:pull:`3238`) by `Justus Magin <https://github.com/keewis>`_.
+  (:pull:`3238`, :pull:`3447`, :pull:`3508`) by `Justus Magin <https://github.com/keewis>`_.
 
   .. note::
 
@@ -123,6 +130,8 @@ Internal Changes
 - Run basic CI tests on Python 3.8. (:pull:`3477`)
   By `Maximilian Roos <https://github.com/max-sixty>`_
 
+- Enable type checking on default sentinel values (:pull:`3472`)
+  By `Maximilian Roos <https://github.com/max-sixty>`_
 
 .. _whats-new.0.14.0:
 
@@ -3730,7 +3739,7 @@ Breaking changes
   warnings: methods and attributes that were deprecated in xray v0.3 or earlier
   (e.g., ``dimensions``, ``attributes```) have gone away.
 
-.. _bottleneck: https://github.com/kwgoodman/bottleneck
+.. _bottleneck: https://github.com/pydata/bottleneck
 
 Enhancements
 ~~~~~~~~~~~~

diff --git a/setup.cfg b/setup.cfg
@@ -1,7 +1,7 @@
 [tool:pytest]
 python_files=test_*.py
 testpaths=xarray/tests properties
-# Fixed upstream in https://github.com/kwgoodman/bottleneck/pull/199
+# Fixed upstream in https://github.com/pydata/bottleneck/pull/199
 filterwarnings =
     ignore:Using a non-tuple sequence for multidimensional indexing is deprecated:FutureWarning
 env =

diff --git a/xarray/core/dataarray.py b/xarray/core/dataarray.py
@@ -51,6 +51,7 @@
 from .dataset import Dataset, merge_indexes, split_indexes
 from .formatting import format_item
 from .indexes import Indexes, default_indexes
+from .merge import PANDAS_TYPES
 from .options import OPTIONS
 from .utils import Default, ReprObject, _check_inplace, _default, either_dict_or_kwargs
 from .variable import (
@@ -357,7 +358,7 @@ def __init__(
                 dims = getattr(data, "dims", getattr(coords, "dims", None))
             if name is None:
                 name = getattr(data, "name", None)
-            if attrs is None:
+            if attrs is None and not isinstance(data, PANDAS_TYPES):
                 attrs = getattr(data, "attrs", None)
             if encoding is None:
                 encoding = getattr(data, "encoding", None)
@@ -919,7 +920,7 @@ def copy(self, deep: bool = True, data: Any = None) -> "DataArray":
         Coordinates:
         * x        (x) <U1 'a' 'b' 'c'
 
-        See also
+        See Also
         --------
         pandas.DataFrame.copy
         """
@@ -1716,7 +1717,7 @@ def stack(
                    codes=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]],
                    names=['x', 'y'])
 
-        See also
+        See Also
         --------
         DataArray.unstack
         """
@@ -1764,7 +1765,7 @@ def unstack(
         >>> arr.identical(roundtripped)
         True
 
-        See also
+        See Also
         --------
         DataArray.stack
         """
@@ -1922,6 +1923,11 @@ def drop(
         """Backward compatible method based on `drop_vars` and `drop_sel`
 
         Using either `drop_vars` or `drop_sel` is encouraged
+
+        See Also
+        --------
+        DataArray.drop_vars
+        DataArray.drop_sel
         """
         ds = self._to_temp_dataset().drop(labels, dim, errors=errors)
         return self._from_temp_dataset(ds)

diff --git a/xarray/core/dataset.py b/xarray/core/dataset.py
@@ -3557,6 +3557,11 @@ def drop(self, labels=None, dim=None, *, errors="raise", **labels_kwargs):
         """Backward compatible method based on `drop_vars` and `drop_sel`
 
         Using either `drop_vars` or `drop_sel` is encouraged
+
+        See Also
+        --------
+        Dataset.drop_vars
+        Dataset.drop_sel
         """
         if errors not in ["raise", "ignore"]:
             raise ValueError('errors must be either "raise" or "ignore"')
@@ -4108,14 +4113,14 @@ def reduce(
             variables, coord_names=coord_names, attrs=attrs, indexes=indexes
         )
 
-    def apply(
+    def map(
         self,
         func: Callable,
         keep_attrs: bool = None,
         args: Iterable[Any] = (),
         **kwargs: Any,
     ) -> "Dataset":
-        """Apply a function over the data variables in this dataset.
+        """Apply a function to each variable in this dataset
 
         Parameters
         ----------
@@ -4135,7 +4140,7 @@ def apply(
         Returns
         -------
         applied : Dataset
-            Resulting dataset from applying ``func`` over each data variable.
+            Resulting dataset from applying ``func`` to each data variable.
 
         Examples
         --------
@@ -4148,7 +4153,7 @@ def apply(
         Data variables:
             foo      (dim_0, dim_1) float64 -0.3751 -1.951 -1.945 0.2948 0.711 -0.3948
             bar      (x) int64 -1 2
-        >>> ds.apply(np.fabs)
+        >>> ds.map(np.fabs)
         <xarray.Dataset>
         Dimensions:  (dim_0: 2, dim_1: 3, x: 2)
         Dimensions without coordinates: dim_0, dim_1, x
@@ -4165,6 +4170,27 @@ def apply(
         attrs = self.attrs if keep_attrs else None
         return type(self)(variables, attrs=attrs)
 
+    def apply(
+        self,
+        func: Callable,
+        keep_attrs: bool = None,
+        args: Iterable[Any] = (),
+        **kwargs: Any,
+    ) -> "Dataset":
+        """
+        Backward compatible implementation of ``map``
+
+        See Also
+        --------
+        Dataset.map
+        """
+        warnings.warn(
+            "Dataset.apply may be deprecated in the future. Using Dataset.map is encouraged",
+            PendingDeprecationWarning,
+            stacklevel=2,
+        )
+        return self.map(func, keep_attrs, args, **kwargs)
+
     def assign(
         self, variables: Mapping[Hashable, Any] = None, **variables_kwargs: Hashable
     ) -> "Dataset":