Merge branch 'main' into groupby-reduce

* main: Add typing_extensions as a required dependency (pydata#5911) pydata#5740 follow up: supress xr.ufunc warnings in tests (pydata#5914) Avoid accessing slow .data in unstack (pydata#5906) Add wradlib to ecosystem in docs (pydata#5915) Use .to_numpy() for quantified facetgrids (pydata#5886) [test-upstream] fix pd skipna=None (pydata#5899) Add var and std to weighted computations (pydata#5870) Check for path-like objects rather than Path type, use os.fspath (pydata#5879) Handle single `PathLike` objects in `open_mfdataset()` (pydata#5884)
dcherian · Oct 29, 2021 · fe870e5 · fe870e5
2 parents 85b63b6 + bcb96ce
commit fe870e5
Show file tree

Hide file tree

Showing 29 changed files with 440 additions and 148 deletions.
diff --git a/ci/requirements/environment-windows.yml b/ci/requirements/environment-windows.yml
@@ -39,6 +39,7 @@ dependencies:
   - setuptools
   - sparse
   - toolz
+  - typing_extensions
   - zarr
   - pip:
     - numbagg
diff --git a/ci/requirements/environment.yml b/ci/requirements/environment.yml
@@ -43,6 +43,7 @@ dependencies:
   - setuptools
   - sparse
   - toolz
+  - typing_extensions
   - zarr
   - pip:
     - numbagg
diff --git a/ci/requirements/py37-bare-minimum.yml b/ci/requirements/py37-bare-minimum.yml
@@ -13,3 +13,4 @@ dependencies:
   - numpy=1.17
   - pandas=1.0
   - setuptools=40.4
+  - typing_extensions=3.7
diff --git a/ci/requirements/py37-min-all-deps.yml b/ci/requirements/py37-min-all-deps.yml
@@ -47,6 +47,7 @@ dependencies:
   - setuptools=40.4
   - sparse=0.8
   - toolz=0.10
+  - typing_extensions=3.7
   - zarr=2.4
   - pip:
     - numbagg==0.1
diff --git a/ci/requirements/py38-all-but-dask.yml b/ci/requirements/py38-all-but-dask.yml
@@ -39,6 +39,7 @@ dependencies:
   - setuptools
   - sparse
   - toolz
+  - typing_extensions
   - zarr
   - pip:
     - numbagg
diff --git a/doc/api.rst b/doc/api.rst
@@ -779,12 +779,18 @@ Weighted objects
 
    core.weighted.DataArrayWeighted
    core.weighted.DataArrayWeighted.mean
+   core.weighted.DataArrayWeighted.std
    core.weighted.DataArrayWeighted.sum
+   core.weighted.DataArrayWeighted.sum_of_squares
    core.weighted.DataArrayWeighted.sum_of_weights
+   core.weighted.DataArrayWeighted.var
    core.weighted.DatasetWeighted
    core.weighted.DatasetWeighted.mean
+   core.weighted.DatasetWeighted.std
    core.weighted.DatasetWeighted.sum
+   core.weighted.DatasetWeighted.sum_of_squares
    core.weighted.DatasetWeighted.sum_of_weights
+   core.weighted.DatasetWeighted.var
 
 
 Coarsen objects

diff --git a/doc/ecosystem.rst b/doc/ecosystem.rst
@@ -37,6 +37,7 @@ Geosciences
 - `Spyfit <https://spyfit.readthedocs.io/en/master/>`_: FTIR spectroscopy of the atmosphere
 - `windspharm <https://ajdawson.github.io/windspharm/index.html>`_: Spherical
   harmonic wind analysis in Python.
+- `wradlib <https://wradlib.org/>`_: An Open Source Library for Weather Radar Data Processing.
 - `wrf-python <https://wrf-python.readthedocs.io/>`_: A collection of diagnostic and interpolation routines for use with output of the Weather Research and Forecasting (WRF-ARW) Model.
 - `xarray-simlab <https://xarray-simlab.readthedocs.io>`_: xarray extension for computer model simulations.
 - `xarray-spatial <https://makepath.github.io/xarray-spatial>`_: Numba-accelerated raster-based spatial processing tools (NDVI, curvature, zonal-statistics, proximity, hillshading, viewshed, etc.)

diff --git a/doc/getting-started-guide/installing.rst b/doc/getting-started-guide/installing.rst
@@ -8,6 +8,7 @@ Required dependencies
 
 - Python (3.7 or later)
 - setuptools (40.4 or later)
+- ``typing_extensions`` (3.7 or later)
 - `numpy <http://www.numpy.org/>`__ (1.17 or later)
 - `pandas <http://pandas.pydata.org/>`__ (1.0 or later)
 

diff --git a/doc/user-guide/computation.rst b/doc/user-guide/computation.rst
@@ -263,7 +263,7 @@ Weighted array reductions
 
 :py:class:`DataArray` and :py:class:`Dataset` objects include :py:meth:`DataArray.weighted`
 and :py:meth:`Dataset.weighted` array reduction methods. They currently
-support weighted ``sum`` and weighted ``mean``.
+support weighted ``sum``, ``mean``, ``std`` and ``var``.
 
 .. ipython:: python
 
@@ -298,13 +298,27 @@ The weighted sum corresponds to:
     weighted_sum = (prec * weights).sum()
     weighted_sum
 
-and the weighted mean to:
+the weighted mean to:
 
 .. ipython:: python
 
     weighted_mean = weighted_sum / weights.sum()
     weighted_mean
 
+the weighted variance to:
+
+.. ipython:: python
+
+    weighted_var = weighted_prec.sum_of_squares() / weights.sum()
+    weighted_var
+
+and the weighted standard deviation to:
+
+.. ipython:: python
+
+    weighted_std = np.sqrt(weighted_var)
+    weighted_std
+
 However, the functions also take missing values in the data into account:
 
 .. ipython:: python
@@ -327,7 +341,7 @@ If the weights add up to to 0, ``sum`` returns 0:
 
     data.weighted(weights).sum()
 
-and ``mean`` returns ``NaN``:
+and ``mean``, ``std`` and ``var`` return ``NaN``:
 
 .. ipython:: python
 

diff --git a/doc/whats-new.rst b/doc/whats-new.rst
@@ -23,6 +23,8 @@ v0.19.1 (unreleased)
 
 New Features
 ~~~~~~~~~~~~
+- Add :py:meth:`var`, :py:meth:`std` and :py:meth:`sum_of_squares` to :py:meth:`Dataset.weighted` and :py:meth:`DataArray.weighted`.
+  By `Christian Jauvin <https://github.com/cjauvin>`_.
 - Added a :py:func:`get_options` method to xarray's root namespace (:issue:`5698`, :pull:`5716`)
   By `Pushkar Kopparla <https://github.com/pkopparla>`_.
 - Xarray now does a better job rendering variable names that are long LaTeX sequences when plotting (:issue:`5681`, :pull:`5682`).
@@ -80,6 +82,15 @@ Bug fixes
   By `Jimmy Westling <https://github.com/illviljan>`_.
 - Numbers are properly formatted in a plot's title (:issue:`5788`, :pull:`5789`).
   By `Maxime Liquet <https://github.com/maximlt>`_.
+- Faceted plots will no longer raise a `pint.UnitStrippedWarning` when a `pint.Quantity` array is plotted,
+  and will correctly display the units of the data in the colorbar (if there is one) (:pull:`5886`).
+  By `Tom Nicholas <https://github.com/TomNicholas>`_.
+- With backends, check for path-like objects rather than ``pathlib.Path``
+  type, use ``os.fspath`` (:pull:`5879`).
+  By `Mike Taves <https://github.com/mwtoews>`_.
+- ``open_mfdataset()`` now accepts a single ``pathlib.Path`` object (:issue: `5881`).
+  By `Panos Mavrogiorgos <https://github.com/pmav99>`_.
+- Improved performance of :py:meth:`Dataset.unstack` (:pull:`5906`). By `Tom Augspurger <https://github.com/TomAugspurger>`_.
 
 Documentation
 ~~~~~~~~~~~~~

diff --git a/requirements.txt b/requirements.txt
@@ -5,4 +5,4 @@
 numpy >= 1.17
 pandas >= 1.0
 setuptools >= 40.4
-typing-extensions >= 3.10
+typing-extensions >= 3.7
diff --git a/setup.cfg b/setup.cfg
@@ -78,6 +78,7 @@ python_requires = >=3.7
 install_requires =
     numpy >= 1.17
     pandas >= 1.0
+    typing_extensions >= 3.7
     setuptools >= 40.4  # For pkg_resources
 
 [options.extras_require]

diff --git a/xarray/backends/api.py b/xarray/backends/api.py
@@ -2,7 +2,6 @@
 from glob import glob
 from io import BytesIO
 from numbers import Number
-from pathlib import Path
 from typing import (
     TYPE_CHECKING,
     Callable,
@@ -808,7 +807,7 @@ def open_mfdataset(
         - "override": if indexes are of same size, rewrite indexes to be
           those of the first object with that dimension. Indexes for the same
           dimension must have the same size in all objects.
-    attrs_file : str or pathlib.Path, optional
+    attrs_file : str or path-like, optional
         Path of the file used to read global attributes from.
         By default global attributes are read from the first file provided,
         with wildcard matches sorted by filename.
@@ -865,8 +864,10 @@ def open_mfdataset(
             )
         else:
             paths = sorted(glob(_normalize_path(paths)))
+    elif isinstance(paths, os.PathLike):
+        paths = [os.fspath(paths)]
     else:
-        paths = [str(p) if isinstance(p, Path) else p for p in paths]
+        paths = [os.fspath(p) if isinstance(p, os.PathLike) else p for p in paths]
 
     if not paths:
         raise OSError("no files to open")
@@ -958,8 +959,8 @@ def multi_file_closer():
 
     # read global attributes from the attrs_file or from the first dataset
     if attrs_file is not None:
-        if isinstance(attrs_file, Path):
-            attrs_file = str(attrs_file)
+        if isinstance(attrs_file, os.PathLike):
+            attrs_file = os.fspath(attrs_file)
         combined.attrs = datasets[paths.index(attrs_file)].attrs
 
     return combined
@@ -992,8 +993,8 @@ def to_netcdf(
 
     The ``multifile`` argument is only for the private use of save_mfdataset.
     """
-    if isinstance(path_or_file, Path):
-        path_or_file = str(path_or_file)
+    if isinstance(path_or_file, os.PathLike):
+        path_or_file = os.fspath(path_or_file)
 
     if encoding is None:
         encoding = {}
@@ -1134,7 +1135,7 @@ def save_mfdataset(
     ----------
     datasets : list of Dataset
         List of datasets to save.
-    paths : list of str or list of Path
+    paths : list of str or list of path-like objects
         List of paths to which to save each corresponding dataset.
     mode : {"w", "a"}, optional
         Write ("w") or append ("a") mode. If mode="w", any existing file at
@@ -1302,7 +1303,7 @@ def check_dtype(var):
 
 def to_zarr(
     dataset: Dataset,
-    store: Union[MutableMapping, str, Path] = None,
+    store: Union[MutableMapping, str, os.PathLike] = None,
     chunk_store=None,
     mode: str = None,
     synchronizer=None,
@@ -1326,7 +1327,7 @@ def to_zarr(
         if v.size == 0:
             v.load()
 
-    # expand str and Path arguments
+    # expand str and path-like arguments
     store = _normalize_path(store)
     chunk_store = _normalize_path(chunk_store)
 

diff --git a/xarray/backends/common.py b/xarray/backends/common.py
@@ -1,8 +1,7 @@
 import logging
-import os.path
+import os
 import time
 import traceback
-from pathlib import Path
 from typing import Any, Dict, Tuple, Type, Union
 
 import numpy as np
@@ -20,8 +19,8 @@
 
 
 def _normalize_path(path):
-    if isinstance(path, Path):
-        path = str(path)
+    if isinstance(path, os.PathLike):
+        path = os.fspath(path)
 
     if isinstance(path, str) and not is_remote_uri(path):
         path = os.path.abspath(os.path.expanduser(path))

diff --git a/xarray/backends/netCDF4_.py b/xarray/backends/netCDF4_.py
@@ -1,7 +1,6 @@
 import functools
 import operator
 import os
-import pathlib
 from contextlib import suppress
 
 import numpy as np
@@ -346,7 +345,7 @@ def open(
         autoclose=False,
     ):
 
-        if isinstance(filename, pathlib.Path):
+        if isinstance(filename, os.PathLike):
             filename = os.fspath(filename)
 
         if not isinstance(filename, str):

diff --git a/xarray/backends/zarr.py b/xarray/backends/zarr.py
@@ -1,5 +1,4 @@
 import os
-import pathlib
 import warnings
 from distutils.version import LooseVersion
 
@@ -346,7 +345,7 @@ def open_group(
     ):
 
         # zarr doesn't support pathlib.Path objects yet. zarr-python#601
-        if isinstance(store, pathlib.Path):
+        if isinstance(store, os.PathLike):
             store = os.fspath(store)
 
         open_kwargs = dict(

diff --git a/xarray/core/dataset.py b/xarray/core/dataset.py
@@ -7,7 +7,7 @@
 from html import escape
 from numbers import Number
 from operator import methodcaller
-from pathlib import Path
+from os import PathLike
 from typing import (
     TYPE_CHECKING,
     Any,
@@ -1832,7 +1832,7 @@ def to_netcdf(
 
         Parameters
         ----------
-        path : str, Path or file-like, optional
+        path : str, path-like or file-like, optional
             Path to which to save this dataset. File-like objects are only
             supported by the scipy engine. If no path is provided, this
             function returns the resulting netCDF file as bytes; in this case,
@@ -1914,8 +1914,8 @@ def to_netcdf(
 
     def to_zarr(
         self,
-        store: Union[MutableMapping, str, Path] = None,
-        chunk_store: Union[MutableMapping, str, Path] = None,
+        store: Union[MutableMapping, str, PathLike] = None,
+        chunk_store: Union[MutableMapping, str, PathLike] = None,
         mode: str = None,
         synchronizer=None,
         group: str = None,
@@ -1944,9 +1944,9 @@ def to_zarr(
 
         Parameters
         ----------
-        store : MutableMapping, str or Path, optional
+        store : MutableMapping, str or path-like, optional
             Store or path to directory in local or remote file system.
-        chunk_store : MutableMapping, str or Path, optional
+        chunk_store : MutableMapping, str or path-like, optional
             Store or path to directory in local or remote file system only for Zarr
             array chunks. Requires zarr-python v2.4.0 or later.
         mode : {"w", "w-", "a", "r+", None}, optional
@@ -4153,34 +4153,34 @@ def unstack(
                 )
 
         result = self.copy(deep=False)
-        for dim in dims:
 
-            if (
-                # Dask arrays don't support assignment by index, which the fast unstack
-                # function requires.
-                # https://github.com/pydata/xarray/pull/4746#issuecomment-753282125
-                any(is_duck_dask_array(v.data) for v in self.variables.values())
-                # Sparse doesn't currently support (though we could special-case
-                # it)
-                # https://github.com/pydata/sparse/issues/422
-                or any(
-                    isinstance(v.data, sparse_array_type)
-                    for v in self.variables.values()
-                )
-                or sparse
-                # Until https://github.com/pydata/xarray/pull/4751 is resolved,
-                # we check explicitly whether it's a numpy array. Once that is
-                # resolved, explicitly exclude pint arrays.
-                # # pint doesn't implement `np.full_like` in a way that's
-                # # currently compatible.
-                # # https://github.com/pydata/xarray/pull/4746#issuecomment-753425173
-                # # or any(
-                # #     isinstance(v.data, pint_array_type) for v in self.variables.values()
-                # # )
-                or any(
-                    not isinstance(v.data, np.ndarray) for v in self.variables.values()
-                )
-            ):
+        # we want to avoid allocating an object-dtype ndarray for a MultiIndex,
+        # so we can't just access self.variables[v].data for every variable.
+        # We only check the non-index variables.
+        # https://github.com/pydata/xarray/issues/5902
+        nonindexes = [
+            self.variables[k] for k in set(self.variables) - set(self.xindexes)
+        ]
+        # Notes for each of these cases:
+        # 1. Dask arrays don't support assignment by index, which the fast unstack
+        #    function requires.
+        #    https://github.com/pydata/xarray/pull/4746#issuecomment-753282125
+        # 2. Sparse doesn't currently support (though we could special-case it)
+        #    https://github.com/pydata/sparse/issues/422
+        # 3. pint requires checking if it's a NumPy array until
+        #    https://github.com/pydata/xarray/pull/4751 is resolved,
+        #    Once that is resolved, explicitly exclude pint arrays.
+        #    pint doesn't implement `np.full_like` in a way that's
+        #    currently compatible.
+        needs_full_reindex = sparse or any(
+            is_duck_dask_array(v.data)
+            or isinstance(v.data, sparse_array_type)
+            or not isinstance(v.data, np.ndarray)
+            for v in nonindexes
+        )
+
+        for dim in dims:
+            if needs_full_reindex:
                 result = result._unstack_full_reindex(dim, fill_value, sparse)
             else:
                 result = result._unstack_once(dim, fill_value)