Read grid mapping and bounds as coords #2844

DWesl · 2019-03-22T15:25:37Z

I prefer having these as coordinates rather than data variables.

This does not cooperate with slicing/pulling out individual variables.
grid_mapping should only be associated with variables that have
horizontal dimensions or coordinates.
bounds should stay associated despite having more dimensions.

I have not implemented similar functionality for the iris conversions.

An alternate approach to dealing with bounds (not used here) is to use a
pandas.IntervalIndex
http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.IntervalIndex.html#pandas.IntervalIndex
and use where the coordinate is within its cell to determine on which
side the intervals are closed (x_dim == x_dim_bnds[:, 0] corresponds
to "left", x_dim == x_dim_bnds[:, 1] corresponds to "right", and
anything else is "neither"). This would stay through slicing and
might already be used for .groupby_bins(), but would not generalize
to boundaries of multidimensional coordinates unless someone
implements a multidimensional generalization of pd.IntervalIndex

Closes #xxxx
Tests added
Fully documented, including whats-new.rst for all changes and api.rst for new API

This does not cooperate with slicing/pulling out individual variables. `grid_mapping` should only be associated with variables that have horizontal dimensions or coordinates. `bounds` should stay associated despite having more dimensions. An alternate approach to dealing with bounds is to use a `pandas.IntervalIndex` http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.IntervalIndex.html#pandas.IntervalIndex and use where the coordinate is within its cell to determine on which side the intervals are closed (`x_dim == x_dim_bnds[:, 0]` corresponds to "left", `x_dim == x_dim_bnds[:, 1]` corresponds to "right", and anything else is "neither"). This would stay through slicing and might already be used for `.groupby_bins()`, but would not generalize to boundaries of multidimensional coordinates unless someone implements a multidimensional generalization of `pd.IntervalIndex`. I do not yet know where to put tests for this. This should probably also be mentioned in the documentation.

DWesl · 2019-03-26T11:31:05Z

Related to #1475 and #2288 , but this is just keeping the metadata consistent where already present, not extending the data model to include bounds, cells, or projections. I should add a test to ensure saving still works if the bounds are lost when pulling out variables.

shoyer

I'm sympathetic to this change. My main concerns are mentioned inline below: the way this is currently written encoding/attrs are used inconsistently with other CF metadata.

Alternatively, we might only put these fields in encoding when reading from disk, and only use encoding when choosing how to write data. The downside is that these attributes would not be as visible in xarray's data model.

xarray/conventions.py

DWesl · 2019-03-30T21:17:17Z

I can shift this to use encoding only, but I'm having trouble figuring out where that code would go.

Would the preferred path be to create VariableCoder classes for each and add them to encode_cf_variable, then add tests to xarray.tests.test_coding?

shoyer · 2019-03-30T23:27:19Z

The current VariableCoder interface only works for coders that work at the level of individual variables. But coordinates only make sense on a dataset, so we'll need a different abstraction for that, e.g., a DatasetCoder?

For now, it's probably fine to keep this logic where you have it currently.

Discussion on GH2844 indicates that encoding is the preferred location for how things are stored on disk.

DWesl

I shifted everything to use encoding rather than encoding and attrs, and added enough new code that my new tests pass locally.

xarray/conventions.py

andreas-h · 2019-05-30T12:11:02Z

I'd like this one to be merged very much. Is there anything holding this back?

Also, it might be nice to update the documentation with info on this.

shoyer

I have a minor code suggestion, otherwise this looks good to me 👍 .

Please also add a brief note to the release notes in whats-new.rst

shoyer · 2019-05-30T22:23:15Z

xarray/conventions.py

@@ -235,6 +235,11 @@ def encode_cf_variable(var, needs_copy=True, name=None):
    var = maybe_default_fill_value(var)
    var = maybe_encode_bools(var)
    var = ensure_dtype_not_object(var, name=name)
+
+    if var.encoding.get('grid_mapping', None) is not None:


There's no need to supply a default value of None with .get(), that is already the default:

Suggested change

if var.encoding.get('grid_mapping', None) is not None:

if var.encoding.get('grid_mapping') is not None:

But actually: are there any cases where someone would explicitly have grid_mapping=None in encoding? If not, then let's just check:

Suggested change

if var.encoding.get('grid_mapping', None) is not None:

if 'grid_mapping' in var.encoding:

Probably not. That detail should go in documentation somewhere. Where would you suggest?

Code change is done. Docs change is not. What page should that go on? IO/NetCDF?

xarray/conventions.py

dcherian · 2019-05-31T02:37:31Z

I'm a little confused on why these are in encoding and not in attrs.

DWesl · 2019-05-31T03:04:06Z

This is briefly mentioned above, in
#2844 (comment)
The rationale was that everywhere else xarray uses CF attributes for something, the original values of those attributes are recorded in var.encoding, not var.attrs, and consistency across a code base is a good thing. Since I'm doing this primarily to get grid_mapping and bounds variables out of ds.data_vars, I don't have strong opinions on the subject.

If you feel strongly to the contrary, there's an idea at the top of this thread for getting bounds information encoded in terms xarray already uses in some cases (Dataset.groupby_bins()), and the diffs for this PR should help you figure out what needs changing to support this.

For grid_mapping there's
http://xarray.pydata.org/en/latest/weather-climate.html#cf-compliant-coordinate-variables
which is enough for my uses.

DWesl · 2019-05-31T04:00:17Z

Switched to use in rather than is not None.

Re: grid_mapping in .encoding not .attrs
MetPy assumes grid_mapping will be in .attrs. Since the xarray documentation mentions this capability, should I be making concurrent changes to MetPy to allow this to continue?

If so, would it be sufficient to change their .attrs references to .encoding and mentioning in both sets of documentation that the user should call ds.metpy.parse_cf() immediately after loading to ensure the information is available for MetPy to use? I don't entirely understand the accessor API.

dcherian · 2019-05-31T15:50:28Z

It isn't just MetPy though. I'm sure there's existing code relying on adding grid_mapping and bounds to attrs in order to write CF-compliant files. So there's a (potentially big) backward compatibility issue. This becomes worse if in the future we keep interpreting more CF attributes and moving them to encoding :/.

Since I'm doing this primarily to get grid_mapping and bounds variables out of ds.data_vars.

I'm +1 on this but I wonder whether saving them in attrs and using that information when encoding coordinates would be the more pragmatic choice.

We could define encoding as containing a specified set of CF attributes that control on-disk representation such as units, scale_factor, contiguous etc. and leaving everything else in attrs. A full list of attributes that belong in encoding could be in the docs so that downstream packages can fully depend on this behaviour.

Currently I see coordinates is interpreted and moved to encoding. In the above proposal, this would be left in attrs but its value would still be interpreted if decode_coords=True.

What do you think?

DWesl · 2019-06-01T14:20:08Z

Since I'm doing this primarily to get grid_mapping and bounds variables out of ds.data_vars. I'm +1 on this but I wonder whether saving them in |attrs| and using that information when encoding coordinates would be the more pragmatic choice.

DWesl · 2020-02-14T12:47:06Z

I just noticed pandas.PeriodIndex would be an alternative to pandas.IntervalIndex for time data if which side the interval is closed on is largely irrelevant for such data.

Is there an interest in using these for 1D coordinates with bounds? I think ds.groupby_bins() already returns an IntervalIndex.

pep8speaks · 2020-02-14T12:59:36Z

Hello @DWesl! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-02-11 17:47:29 UTC

DWesl · 2020-02-18T13:43:12Z

The test failures seem to all be due to recent changes in cftime/CFTimeIndex, which I haven't touched.

Is sticking the grid_mapping and bounds attributes in encoding good, or should I put them back in attrs?

dcherian · 2020-03-10T08:03:37Z

My preference is for attrs but someone else should weigh in. cc @pydata/xarray

Maybe @dopplershift , as MetPy maintainer has thoughts on this matter?

dopplershift · 2020-03-10T23:47:43Z

As a downstream user, I just want to be told what to do (assuming encoding is part of the public API for xarray). I'd love not to have to modify our code, but that's not essential necessarily.

So to clarify: is this about whether they should be in one spot or the other? Or is it about having grid_mapping and bounds in both?

DWesl · 2020-03-10T23:54:41Z

I think the choice is between attrs and encoding, not both.

If it helps lean your decision one way or the other, attrs tends to stay associated with Datasets through more operations than encoding, so parse_cf() would have to be called fairly soon after opening if the information ends up in encoding, while putting it in attrs gives users a bit more time for that.

dopplershift · 2020-03-11T21:19:31Z

Thanks for the info. Based on that, I lean towards attrs.

I think a better rationale, though, would be to formalize the role of encoding in xarray.

shoyer · 2020-03-12T05:43:30Z

We could probably make it a rule that encoding gets preserved in exactly the same way as attrs. I think there are some old issues about formalizing metadata conventions...

…

On Wed, Mar 11, 2020 at 2:19 PM Ryan May ***@***.***> wrote: Thanks for the info. Based on that, I lean towards attrs. I think a better rationale, though, would be to formalize the role of encoding in xarray. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2844 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJJFVVUMBSZNECAM74HF4TRG757BANCNFSM4HAPJB7A> .

Note that the identified coordinates are no longer primarily CF Auxiliary Coordinates. Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

There's more than three attributes used to assign variables as coordinates. Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

Use two backticks for monospace font. Single backticks trop you into the default role. Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

I'm not just testing ``grid_mapping`` and ``bounds`` here, so I should have the name reflect that.

dcherian

Thanks @DWesl just pushed a small docs commit. I think this is ready to merge. Thank you for your patience.

DWesl · 2021-01-17T16:48:39Z

Looks good to me. I was wondering where those docstrings were.

jthielen · 2021-01-17T16:58:34Z

I unfortunately have not been following along with the recent developments in this PR, so these may have already been previously covered. Sorry if that is the case!

However, earlier on in the development of this PR, there were some substantial concerns about backwards compatibility (e.g., for libraries like MetPy that currently rely on grid_mapping and the like being in attrs) and the improved propagation of encoding through operations (so that moving these to encoding doesn't mean they are unnecessarily lost). What is the current status with regards to these?

dcherian · 2021-01-19T15:03:02Z

there were some substantial concerns about backwards compatibility

:( yes this is currently backward incompatible (so the next release will bump major version). There's reluctance to add yet another decode_* kwarg to open_dataset, and also reluctance to issue a warning saying that the behaviour of decode_coords will change in a future version (since this would be very common).

One option is to change decode_coords from bool to Union[bool, str] and allow decode_coords to be either "all" (this PR) or "coordinates", or True ( == "coordinates", backwards compatible). I can bring this up at the next dev meeting.

…_and_bounds_as_coords * upstream/master: (51 commits) Ensure maximum accuracy when encoding and decoding cftime.datetime values (pydata#4758) Fix `bounds_error=True` ignored with 1D interpolation (pydata#4855) add a drop_conflicts strategy for merging attrs (pydata#4827) update pre-commit hooks (mypy) (pydata#4883) ensure warnings cannot become errors in assert_ (pydata#4864) update pre-commit hooks (pydata#4874) small fixes for the docstrings of swap_dims and integrate (pydata#4867) Modify _encode_datetime_with_cftime for compatibility with cftime > 1.4.0 (pydata#4871) vélin (pydata#4872) don't skip the doctests CI (pydata#4869) fix da.pad example for numpy 1.20 (pydata#4865) temporarily pin dask (pydata#4873) Add units if "unit" is in the attrs. (pydata#4850) speed up the repr for big MultiIndex objects (pydata#4846) dim -> coord in DataArray.integrate (pydata#3993) WIP: backend interface, now it uses subclassing (pydata#4836) weighted: small improvements (pydata#4818) Update related-projects.rst (pydata#4844) iris update doc url (pydata#4845) Faster unstacking (pydata#4746) ...

dcherian · 2021-02-11T14:12:04Z

One option is to change decode_coords from bool to Union[bool, str] and allow decode_coords to be either "all" (this PR) or "coordinates", or True ( == "coordinates", backwards compatible). I can bring this up at the next dev meeting.

Updated to implement this proposal after getting OK at the previous meeting.

@DWesl can you take a look at the most recent changes please?

DWesl · 2021-02-13T14:46:25Z

I think this looks good.

dcherian · 2021-02-13T15:17:21Z

Great I'll merge before the next release if no one else has comments. Thanks @DWesl

andersy005

This looks great!

~~Does anyone know why the xr.open_dataset(....) call is echoed in the warning message. Is this intentional? Cc @dcherian @DWesl~~

In [4]: ds = xr.open_dataset('/home/abanihi/Downloads/pr_Amon_ACCESS-ESM1-5_historical_r1i1p1f1_gn_201001-201412.nc', decode_c
   ...: oords="all")
<ipython-input-4-ccf90ed4a433>:1: UserWarning: Variable(s) referenced in cell_measures not in variables: ['areacella']
  ds = xr.open_dataset('/home/abanihi/Downloads/pr_Amon_ACCESS-ESM1-5_historical_r1i1p1f1_gn_201001-201412.nc', decode_coords="all")

In [5]: ds
Out[5]: 
<xarray.Dataset>
Dimensions:    (bnds: 2, lat: 145, lon: 192, time: 60)
Coordinates:
  * time       (time) datetime64[ns] 2010-01-16T12:00:00 ... 2014-12-16T12:00:00
    time_bnds  (time, bnds) datetime64[ns] ...
  * lon        (lon) float64 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
    lon_bnds   (lon, bnds) float64 ...
  * lat        (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
    lat_bnds   (lat, bnds) float64 ...
Dimensions without coordinates: bnds
Data variables:
    pr         (time, lat, lon) float32 ...

DWesl · 2021-02-14T03:35:55Z

~~Does anyone know why the xr.open_dataset(....) call is echoed in the warning message. Is this intentional? Cc @dcherian @DWesl~~

It seems you've already figured this out, but for anyone else with this question, the repeat of the call on that file is part of the warning that the file does not have all the variables the attributes refer to. You can fix this by recreating the file with the listed variables added (areacella, or by deleting the attribute from the variables (cell_measures). You can also ignore the warning using the machinery in the warnings module.

This support was added in pydata/xarray#2844 and handles converting the grid_mapping variable to a coordinate in xarray itself, which was incompatible with some assumptions in parse_cf(). Add some handling for the case where the grid_mapping. Also add decode_coords='all' to our full test suite and adjust a few tests as necessary.

DWesl added 2 commits March 22, 2019 10:35

Add tests for (de)serialization of grid_mapping and bounds.

2ae8a7e

rabernat mentioned this pull request Mar 26, 2019

Cannot store data after group_by #2847

Open

shoyer reviewed Mar 29, 2019

View reviewed changes

xarray/conventions.py Outdated Show resolved Hide resolved

xarray/conventions.py Outdated Show resolved Hide resolved

BUG: Use only encoding for tracking bounds and grid_mapping.

fff73c8

Discussion on GH2844 indicates that encoding is the preferred location for how things are stored on disk.

DWesl commented Mar 31, 2019

View reviewed changes

xarray/conventions.py Outdated Show resolved Hide resolved

shoyer reviewed May 30, 2019

View reviewed changes

DWesl added 2 commits May 30, 2019 23:16

Address feedback on PR.

b3696d3

Merge branch 'master' into read_grid_mapping_and_bounds_as_coords

315d39d

dcherian mentioned this pull request Jan 12, 2020

Decode CF bounds to coords #3689

Closed

Merge branch 'master' into read_grid_mapping_and_bounds_as_coords

c82cd47

DWesl added 2 commits February 14, 2020 08:09

Style fixes: newline before binary operator.

02aff73

Style fixes: double quotes for string literals, rewrap lines.

0721506

DWesl and others added 5 commits January 16, 2021 21:10

Update xarray/conventions.py

5268500

Note that the identified coordinates are no longer primarily CF Auxiliary Coordinates. Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

Mention that there are other attributes not listed

2edd367

There's more than three attributes used to assign variables as coordinates. Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

Fix .rst syntax in whats-new

948465c

Use two backticks for monospace font. Single backticks trop you into the default role. Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

Shorten name of another test.

c68d372

I'm not just testing ``grid_mapping`` and ``bounds`` here, so I should have the name reflect that.

Update docs.

9ee7c3a

dcherian approved these changes Jan 17, 2021

View reviewed changes

rabernat mentioned this pull request Jan 24, 2021

testing NetCDFtoZarrSequentialRecipe on a few CMIP6 datasets pangeo-forge/pangeo-forge-recipes#47

Closed

dcherian added 3 commits February 11, 2021 06:41

fix merge.

94b8153

Activate new behaviour only with decode_coords="all"

c8896f3

[skip-ci] fix docstrings

d3ec7ab

andersy005 approved these changes Feb 13, 2021

View reviewed changes

dcherian mentioned this pull request Feb 16, 2021

support passing a function to combine_attrs #4896

Merged

4 tasks

dcherian merged commit 12b4480 into pydata:master Feb 17, 2021

This was referenced Apr 6, 2021

Add very minimal xarray plugin for engine="rasterio" corteva/rioxarray#281

Merged

Add full support for xarray decode_coords corteva/rioxarray#282

Closed

Do not write an empty "coordinates" attribute to a netCDF file #5121

Merged

benbovy mentioned this pull request May 12, 2021

Fix tests with last Xarray versions xpublish-community/xpublish#80

Closed

snowman2 mentioned this pull request May 12, 2021

CRS: CF grid_mapping attribute to encoding opendatacube/datacube-core#1084

Closed

rabernat mentioned this pull request Aug 24, 2021

Add reference HDF recipe type pangeo-forge/pangeo-forge-recipes#174

Merged

This was referenced Oct 25, 2024

Fix xarray support with decode_coords='all' Unidata/MetPy#3664

Open

Make more use of xarray's decode_coords='all' Unidata/MetPy#3665

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read grid mapping and bounds as coords #2844

Read grid mapping and bounds as coords #2844

DWesl commented Mar 22, 2019 •

edited

Loading

DWesl commented Mar 26, 2019

shoyer left a comment

DWesl commented Mar 30, 2019

shoyer commented Mar 30, 2019

DWesl left a comment

andreas-h commented May 30, 2019

shoyer left a comment

shoyer May 30, 2019

shoyer May 30, 2019

DWesl May 31, 2019

DWesl May 29, 2020

dcherian commented May 31, 2019

DWesl commented May 31, 2019 •

edited

Loading

DWesl commented May 31, 2019

dcherian commented May 31, 2019

DWesl commented Jun 1, 2019 via email •

edited

Loading

DWesl commented Feb 14, 2020

pep8speaks commented Feb 14, 2020 •

edited

Loading

DWesl commented Feb 18, 2020

dcherian commented Mar 10, 2020

dopplershift commented Mar 10, 2020

DWesl commented Mar 10, 2020

dopplershift commented Mar 11, 2020

shoyer commented Mar 12, 2020 via email

dcherian left a comment

DWesl commented Jan 17, 2021

jthielen commented Jan 17, 2021

dcherian commented Jan 19, 2021

dcherian commented Feb 11, 2021

DWesl commented Feb 13, 2021

dcherian commented Feb 13, 2021

andersy005 left a comment •

edited

Loading

DWesl commented Feb 14, 2021

	if var.encoding.get('grid_mapping', None) is not None:
	if var.encoding.get('grid_mapping') is not None:

	if var.encoding.get('grid_mapping', None) is not None:
	if 'grid_mapping' in var.encoding:

Read grid mapping and bounds as coords #2844

Read grid mapping and bounds as coords #2844

Conversation

DWesl commented Mar 22, 2019 • edited Loading

DWesl commented Mar 26, 2019

shoyer left a comment

Choose a reason for hiding this comment

DWesl commented Mar 30, 2019

shoyer commented Mar 30, 2019

DWesl left a comment

Choose a reason for hiding this comment

andreas-h commented May 30, 2019

shoyer left a comment

Choose a reason for hiding this comment

shoyer May 30, 2019

Choose a reason for hiding this comment

shoyer May 30, 2019

Choose a reason for hiding this comment

DWesl May 31, 2019

Choose a reason for hiding this comment

DWesl May 29, 2020

Choose a reason for hiding this comment

dcherian commented May 31, 2019

DWesl commented May 31, 2019 • edited Loading

DWesl commented May 31, 2019

dcherian commented May 31, 2019

DWesl commented Jun 1, 2019 via email • edited Loading

DWesl commented Feb 14, 2020

pep8speaks commented Feb 14, 2020 • edited Loading

Comment last updated at 2021-02-11 17:47:29 UTC

DWesl commented Feb 18, 2020

dcherian commented Mar 10, 2020

dopplershift commented Mar 10, 2020

DWesl commented Mar 10, 2020

dopplershift commented Mar 11, 2020

shoyer commented Mar 12, 2020 via email

dcherian left a comment

Choose a reason for hiding this comment

DWesl commented Jan 17, 2021

jthielen commented Jan 17, 2021

dcherian commented Jan 19, 2021

dcherian commented Feb 11, 2021

DWesl commented Feb 13, 2021

dcherian commented Feb 13, 2021

andersy005 left a comment • edited Loading

Choose a reason for hiding this comment

DWesl commented Feb 14, 2021

DWesl commented Mar 22, 2019 •

edited

Loading

DWesl commented May 31, 2019 •

edited

Loading

DWesl commented Jun 1, 2019 via email •

edited

Loading

pep8speaks commented Feb 14, 2020 •

edited

Loading

andersy005 left a comment •

edited

Loading