Merge remote-tracking branch 'upstream/master' into dataset/quiver

* upstream/master: FIX: h5py>=3 string decoding (pydata#4893) Update matplotlib's canonical (pydata#4919) Adding vectorized indexing docs (pydata#4711) Allow fsspec URLs in open_(mf)dataset (pydata#4823) Fix typos in example notebooks (pydata#4908) pre-commit autoupdate CI (pydata#4906) replace the ci-trigger action with a external one (pydata#4905) Update area_weighted_temperature.ipynb (pydata#4903) hide the decorator from the test traceback (pydata#4900) Sort backends (pydata#4886) Compatibility with dask 2021.02.0 (pydata#4884)
dcherian · Feb 17, 2021 · 2b1bc32 · 2b1bc32
2 parents b9bcada + a8ed7ed
commit 2b1bc32
Show file tree

Hide file tree

Showing 29 changed files with 373 additions and 161 deletions.
diff --git a/.github/actions/detect-ci-trigger/action.yaml b/.github/actions/detect-ci-trigger/action.yaml
diff --git a/.github/actions/detect-ci-trigger/script.sh b/.github/actions/detect-ci-trigger/script.sh
diff --git a/.github/workflows/ci-additional.yaml b/.github/workflows/ci-additional.yaml
@@ -19,7 +19,7 @@ jobs:
       - uses: actions/checkout@v2
         with:
           fetch-depth: 2
-      - uses: ./.github/actions/detect-ci-trigger
+      - uses: xarray-contrib/ci-trigger@v1
         id: detect-trigger
         with:
           keyword: "[skip-ci]"

diff --git a/.github/workflows/ci-pre-commit-autoupdate.yaml b/.github/workflows/ci-pre-commit-autoupdate.yaml
@@ -0,0 +1,41 @@
+name: "pre-commit autoupdate CI"
+
+on:
+  schedule:
+    - cron: "0 0 * * 0"  # every Sunday at 00:00 UTC
+  workflow_dispatch:
+
+
+jobs:
+  autoupdate:
+    name: 'pre-commit autoupdate'
+    runs-on: ubuntu-latest
+    if: github.repository == 'pydata/xarray'
+    steps:
+      - name: checkout
+        uses: actions/checkout@v2
+      - name: Cache pip and pre-commit
+        uses: actions/cache@v2
+        with:
+          path: |
+            ~/.cache/pre-commit
+            ~/.cache/pip
+          key: ${{ runner.os }}-pre-commit-autoupdate
+      - name: setup python
+        uses: actions/setup-python@v2
+      - name: upgrade pip
+        run: python -m pip install --upgrade pip
+      - name: install pre-commit
+        run: python -m pip install --upgrade pre-commit
+      - name: version info
+        run: python -m pip list
+      - name: autoupdate
+        uses: technote-space/create-pr-action@837dbe469b39f08d416889369a52e2a993625c84
+        with:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          EXECUTE_COMMANDS: |
+            python -m pre_commit autoupdate
+          COMMIT_MESSAGE: 'pre-commit: autoupdate hook versions'
+          PR_TITLE: 'pre-commit: autoupdate hook versions'
+          PR_BRANCH_PREFIX: 'pre-commit/'
+          PR_BRANCH_NAME: 'autoupdate-${PR_ID}'
diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
@@ -19,7 +19,7 @@ jobs:
       - uses: actions/checkout@v2
         with:
           fetch-depth: 2
-      - uses: ./.github/actions/detect-ci-trigger
+      - uses: xarray-contrib/ci-trigger@v1
         id: detect-trigger
         with:
           keyword: "[skip-ci]"

diff --git a/.github/workflows/upstream-dev-ci.yaml b/.github/workflows/upstream-dev-ci.yaml
@@ -21,7 +21,7 @@ jobs:
       - uses: actions/checkout@v2
         with:
           fetch-depth: 2
-      - uses: ./.github/actions/detect-ci-trigger
+      - uses: xarray-contrib/ci-trigger@v1
         id: detect-trigger
         with:
           keyword: "[test-upstream]"

diff --git a/ci/requirements/environment-windows.yml b/ci/requirements/environment-windows.yml
@@ -8,10 +8,10 @@ dependencies:
   # - cdms2  # Not available on Windows
   # - cfgrib  # Causes Python interpreter crash on Windows: https://github.com/pydata/xarray/pull/3340
   - cftime
-  - dask<2021.02.0
+  - dask
   - distributed
   - h5netcdf
-  - h5py=2
+  - h5py
   - hdf5
   - hypothesis
   - iris

diff --git a/ci/requirements/environment.yml b/ci/requirements/environment.yml
@@ -3,16 +3,17 @@ channels:
   - conda-forge
   - nodefaults
 dependencies:
+  - aiobotocore
   - boto3
   - bottleneck
   - cartopy
   - cdms2
   - cfgrib
   - cftime
-  - dask<2021.02.0
+  - dask
   - distributed
   - h5netcdf
-  - h5py=2
+  - h5py
   - hdf5
   - hypothesis
   - iris

diff --git a/ci/requirements/py38-all-but-dask.yml b/ci/requirements/py38-all-but-dask.yml
@@ -4,6 +4,8 @@ channels:
   - nodefaults
 dependencies:
   - python=3.8
+  - black
+  - aiobotocore
   - boto3
   - bottleneck
   - cartopy
@@ -12,7 +14,7 @@ dependencies:
   - cftime
   - coveralls
   - h5netcdf
-  - h5py=2
+  - h5py
   - hdf5
   - hypothesis
   - lxml    # Optional dep of pydap

diff --git a/doc/conf.py b/doc/conf.py
@@ -415,7 +415,7 @@
     "numpy": ("https://numpy.org/doc/stable", None),
     "scipy": ("https://docs.scipy.org/doc/scipy/reference", None),
     "numba": ("https://numba.pydata.org/numba-doc/latest", None),
-    "matplotlib": ("https://matplotlib.org", None),
+    "matplotlib": ("https://matplotlib.org/stable/", None),
     "dask": ("https://docs.dask.org/en/latest", None),
     "cftime": ("https://unidata.github.io/cftime", None),
     "rasterio": ("https://rasterio.readthedocs.io/en/latest", None),

diff --git a/doc/examples/ERA5-GRIB-example.ipynb b/doc/examples/ERA5-GRIB-example.ipynb
@@ -11,7 +11,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "GRIB format is commonly used to disemminate atmospheric model data. With Xarray and the cfgrib engine, GRIB data can easily be analyzed and visualized."
+    "GRIB format is commonly used to disseminate atmospheric model data. With Xarray and the cfgrib engine, GRIB data can easily be analyzed and visualized."
    ]
   },
   {

diff --git a/doc/examples/ROMS_ocean_model.ipynb b/doc/examples/ROMS_ocean_model.ipynb
@@ -120,7 +120,7 @@
    "source": [
     "### A naive vertical slice\n",
     "\n",
-    "Create a slice using the s-coordinate as the vertical dimension is typically not very informative."
+    "Creating a slice using the s-coordinate as the vertical dimension is typically not very informative."
    ]
   },
   {

diff --git a/doc/examples/area_weighted_temperature.ipynb b/doc/examples/area_weighted_temperature.ipynb
@@ -20,7 +20,7 @@
     "Author: [Mathias Hauser](https://github.com/mathause/)\n",
     "\n",
     "\n",
-    "We use the `air_temperature` example dataset to calculate the area-weighted temperature over its domain. This dataset has a regular latitude/ longitude grid, thus the gridcell area decreases towards the pole. For this grid we can use the cosine of the latitude as proxy for the grid cell area.\n"
+    "We use the `air_temperature` example dataset to calculate the area-weighted temperature over its domain. This dataset has a regular latitude/ longitude grid, thus the grid cell area decreases towards the pole. For this grid we can use the cosine of the latitude as proxy for the grid cell area.\n"
    ]
   },
   {

diff --git a/doc/examples/monthly-means.ipynb b/doc/examples/monthly-means.ipynb
@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Calculating Seasonal Averages from Timeseries of Monthly Means \n",
+    "Calculating Seasonal Averages from Time Series of Monthly Means \n",
     "=====\n",
     "\n",
     "Author: [Joe Hamman](https://github.com/jhamman/)\n",
@@ -60,10 +60,10 @@
    "source": [
     "#### Now for the heavy lifting:\n",
     "We first have to come up with the weights,\n",
-    "- calculate the month lengths for each monthly data record\n",
+    "- calculate the month length for each monthly data record\n",
     "- calculate weights using `groupby('time.season')`\n",
     "\n",
-    "Finally, we just need to multiply our weights by the `Dataset` and sum allong the time dimension.  Creating a `DataArray` for the month length is as easy as using the `days_in_month` accessor on the time coordinate.  The calendar type, in this case `'noleap'`, is automatically considered in this operation."
+    "Finally, we just need to multiply our weights by the `Dataset` and sum along the time dimension.  Creating a `DataArray` for the month length is as easy as using the `days_in_month` accessor on the time coordinate.  The calendar type, in this case `'noleap'`, is automatically considered in this operation."
    ]
   },
   {

diff --git a/doc/indexing.rst b/doc/indexing.rst
@@ -395,6 +395,22 @@ These methods may also be applied to ``Dataset`` objects
     ds = da.to_dataset(name="bar")
     ds.isel(x=xr.DataArray([0, 1, 2], dims=["points"]))
 
+Vectorized indexing may be used to extract information from the nearest
+grid cells of interest, for example, the nearest climate model grid cells
+to a collection specified weather station latitudes and longitudes.
+
+.. ipython:: python
+
+    ds = xr.tutorial.open_dataset("air_temperature")
+
+    # Define target latitude and longitude (where weather stations might be)
+    target_lon = xr.DataArray([200, 201, 202, 205], dims="points")
+    target_lat = xr.DataArray([31, 41, 42, 42], dims="points")
+
+    # Retrieve data at the grid cells nearest to the target latitudes and longitudes
+    da = ds["air"].sel(lon=target_lon, lat=target_lat, method="nearest")
+    da
+
 .. tip::
 
   If you are lazily loading your data from disk, not every form of vectorized

diff --git a/doc/io.rst b/doc/io.rst
@@ -890,17 +890,44 @@ Cloud Storage Buckets
 
 It is possible to read and write xarray datasets directly from / to cloud
 storage buckets using zarr. This example uses the `gcsfs`_ package to provide
-a ``MutableMapping`` interface to `Google Cloud Storage`_, which we can then
-pass to xarray::
+an interface to `Google Cloud Storage`_.
+
+From v0.16.2: general `fsspec`_ URLs are parsed and the store set up for you
+automatically when reading, such that you can open a dataset in a single
+call. You should include any arguments to the storage backend as the
+key ``storage_options``, part of ``backend_kwargs``.
+
+.. code:: python
+
+    ds_gcs = xr.open_dataset(
+        "gcs://<bucket-name>/path.zarr",
+        backend_kwargs={
+            "storage_options": {"project": "<project-name>", "token": None}
+        },
+        engine="zarr",
+    )
+
+
+This also works with ``open_mfdataset``, allowing you to pass a list of paths or
+a URL to be interpreted as a glob string.
+
+For older versions, and for writing, you must explicitly set up a ``MutableMapping``
+instance and pass this, as follows:
+
+.. code:: python
 
     import gcsfs
-    fs = gcsfs.GCSFileSystem(project='<project-name>', token=None)
-    gcsmap = gcsfs.mapping.GCSMap('<bucket-name>', gcs=fs, check=True, create=False)
+
+    fs = gcsfs.GCSFileSystem(project="<project-name>", token=None)
+    gcsmap = gcsfs.mapping.GCSMap("<bucket-name>", gcs=fs, check=True, create=False)
     # write to the bucket
     ds.to_zarr(store=gcsmap)
     # read it back
     ds_gcs = xr.open_zarr(gcsmap)
 
+(or use the utility function ``fsspec.get_mapper()``).
+
+.. _fsspec: https://filesystem-spec.readthedocs.io/en/latest/
 .. _Zarr: http://zarr.readthedocs.io/
 .. _Amazon S3: https://aws.amazon.com/s3/
 .. _Google Cloud Storage: https://cloud.google.com/storage/

diff --git a/doc/whats-new.rst b/doc/whats-new.rst
@@ -74,6 +74,11 @@ New Features
   in the form of kwargs as well as a dict, like most similar methods.
   By `Maximilian Roos <https://github.com/max-sixty>`_.
 
+- :py:func:`open_dataset` and :py:func:`open_mfdataset` now accept ``fsspec`` URLs
+  (including globs for the latter) for ``engine="zarr"``, and so allow reading from
+  many remote and other file systems (:pull:`4461`)
+  By `Martin Durant <https://github.com/martindurant>`_
+
 Bug fixes
 ~~~~~~~~~
 - :py:meth:`DataArray.resample` and :py:meth:`Dataset.resample` do not trigger computations anymore if :py:meth:`Dataset.weighted` or :py:meth:`DataArray.weighted` are applied (:issue:`4625`, :pull:`4668`). By `Julius Busecke <https://github.com/jbusecke>`_.
@@ -111,6 +116,8 @@ Bug fixes
   By `Leif Denby <https://github.com/leifdenby>`_.
 - Fix time encoding bug associated with using cftime versions greater than
   1.4.0 with xarray (:issue:`4870`, :pull:`4871`). By `Spencer Clark <https://github.com/spencerkclark>`_.
+- Fix decoding of vlen strings using h5py versions greater than 3.0.0 with h5netcdf backend (:issue:`4570`, :pull:`4893`).
+  By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.
 
 Documentation
 ~~~~~~~~~~~~~

diff --git a/setup.cfg b/setup.cfg
@@ -185,6 +185,8 @@ ignore_missing_imports = True
 ignore_missing_imports = True
 [mypy-distributed.*]
 ignore_missing_imports = True
+[mypy-fsspec.*]
+ignore_missing_imports = True
 [mypy-h5netcdf.*]
 ignore_missing_imports = True
 [mypy-h5py.*]