Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Terminology page to account for multidimensional coordinates #3410

Merged
merged 4 commits into from
Oct 24, 2019

Conversation

jthielen
Copy link
Contributor

@jthielen jthielen commented Oct 17, 2019

As discussed in #3352, this PR modifies the Terminology page in the docs to briefly address multidimensional coordinates. Sorry for the delay in getting this in!

Also, when attempting to test the doc build, I found that the doc/environment.yml file was no longer present, so I updated it to ci/requirements/doc.yml.

  • Fully documented, including whats-new.rst for all changes and api.rst for new API

@@ -27,15 +27,15 @@ Terminology

----

**Coordinate:** An array that labels a dimension of another ``DataArray``. Loosely, the coordinate array's values can be thought of as tick labels along a dimension. There are two types of coordinate arrays: *dimension coordinates* and *non-dimension coordinates* (see below). A coordinate named ``x`` can be retrieved from ``arr.coords[x]``. A ``DataArray`` can have more coordinates than dimensions because a single dimension can be assigned multiple coordinate arrays. However, only one coordinate array can be a assigned as a particular dimension's dimension coordinate array. As a consequence, ``len(arr.dims) <= len(arr.coords)`` in general.
**Coordinate:** An array that labels a dimension or set of dimensions of another ``DataArray``. In the one-dimensional case, the coordinate array's values can loosely be thought of as tick labels along a dimension, whereas :doc:`multidimensional coordinates are often used when the data's physical coordinates differ from their logical coordinates <examples/multidimensional-coords>`. There are two types of coordinate arrays: *dimension coordinates* and *non-dimension coordinates* (see below). A coordinate named ``x`` can be retrieved from ``arr.coords[x]``. A ``DataArray`` can have more coordinates than dimensions because a single dimension can be assigned multiple coordinate arrays. However, only one coordinate array can be a assigned as a particular dimension's dimension coordinate array. As a consequence, ``len(arr.dims) <= len(arr.coords)`` in general.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are often used when the data's physical coordinates differ from their logical coordinates

What does this mean? Is it climate-focused? (I'm not sure I have something better in mind yet, though)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was taken from https://xarray.pydata.org/en/latest/examples/multidimensional-coords.html, and it was the best I could think of for a concise explanation. I interpreted it to refer to physical coordinates (like latitude and longitude) that do not always line up with the axes ("logical coordinates") of one's data (which is often the case when working with Earth-based data that are on some grid other than latitude/longitude). I'd be glad to change it to something else though if there are any suggestions!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I think it's probably too climate focused atm. I think we could either add for example, in climate datasets..., or remove that clause, or if anyone has suggestions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in geoscience datasets may be more appropriate/general than in climate datasets (speaking as a meteorologist 😁), but I can easily make that change if no other suggestions come up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since no other suggestions came in, I made this update. It also seemed to work better moving this to the Non-dimension coordinate definition (if I should move it back, just let me know).


----

**Dimension coordinate:** A coordinate array assigned to ``arr`` with both a name and dimension name in ``arr.dims``. Dimension coordinates are used for label-based indexing and alignment, like the index found on a :py:class:`pandas.DataFrame` or :py:class:`pandas.Series`. In fact, dimension coordinates use :py:class:`pandas.Index` objects under the hood for efficient computation. Dimension coordinates are marked by ``*`` when printing a ``DataArray`` or ``Dataset``.
**Dimension coordinate:** A one-dimensional coordinate array assigned to ``arr`` with both a name and dimension name in ``arr.dims``. Dimension coordinates are used for label-based indexing and alignment, like the index found on a :py:class:`pandas.DataFrame` or :py:class:`pandas.Series`. In fact, dimension coordinates use :py:class:`pandas.Index` objects under the hood for efficient computation. Dimension coordinates are marked by ``*`` when printing a ``DataArray`` or ``Dataset``.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jthielen jthielen force-pushed the multidimensional-coord-terminology branch from 0d2a187 to 98044fb Compare October 22, 2019 12:18
@max-sixty
Copy link
Collaborator

Looks great! Any other suggestions before we merge?

@max-sixty max-sixty merged commit 35c75f5 into pydata:master Oct 24, 2019
dcherian added a commit to dcherian/xarray that referenced this pull request Oct 24, 2019
* upstream/master:
  minor lint tweaks (pydata#3429)
  Hack around pydata#3440 (pydata#3442)
  Update Terminology page to account for multidimensional coordinates (pydata#3410)
  Use cftime master for upstream-dev build (pydata#3439)
dcherian added a commit to dcherian/xarray that referenced this pull request Oct 24, 2019
…e-multiple-dims

* upstream/master:
  minor lint tweaks (pydata#3429)
  Hack around pydata#3440 (pydata#3442)
  Update Terminology page to account for multidimensional coordinates (pydata#3410)
  Use cftime master for upstream-dev build (pydata#3439)
dcherian added a commit to dcherian/xarray that referenced this pull request Oct 24, 2019
* upstream/master:
  minor lint tweaks (pydata#3429)
  Hack around pydata#3440 (pydata#3442)
  Update Terminology page to account for multidimensional coordinates (pydata#3410)
  Use cftime master for upstream-dev build (pydata#3439)
dcherian added a commit to dcherian/xarray that referenced this pull request Oct 25, 2019
* upstream/master:
  minor lint tweaks (pydata#3429)
  Hack around pydata#3440 (pydata#3442)
  Update Terminology page to account for multidimensional coordinates (pydata#3410)
  Use cftime master for upstream-dev build (pydata#3439)
  MAGA (Make Azure Green Again) (pydata#3436)
  Test that Dataset and DataArray resampling are identical (pydata#3412)
  Avoid multiplication DeprecationWarning in rasterio backend (pydata#3428)
  Sync with latest version of cftime (v1.0.4) (pydata#3430)
  Add cftime git tip to upstream-dev + temporarily pin cftime (pydata#3431)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants