Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples added to docstrings #7936

Merged
merged 43 commits into from
Jul 11, 2023
Merged
Show file tree
Hide file tree
Changes from 42 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
4dc72de
xarray.dataset.tail
harshitha1201 Jun 20, 2023
2c55b95
xarray.dataset.head
harshitha1201 Jun 20, 2023
052fc8f
xarray.dataset.dropna
harshitha1201 Jun 20, 2023
6820863
xarray.dataset.ffill
harshitha1201 Jun 20, 2023
fd7eed4
xarray.dataset.bfill
harshitha1201 Jun 20, 2023
f9ff19c
xarray.dataset.set_Coords
harshitha1201 Jun 20, 2023
2562c58
xarray.dataset.reset_coords
harshitha1201 Jun 20, 2023
3e25744
indentation changes
harshitha1201 Jun 23, 2023
54ad806
indentation
harshitha1201 Jun 27, 2023
44b18b8
reset_coords example
harshitha1201 Jun 27, 2023
70bcf3a
tail_edited
harshitha1201 Jun 27, 2023
1ee84e8
bfill change
harshitha1201 Jun 27, 2023
f46c867
Merge branch 'main' into add-examples2
harshitha1201 Jun 27, 2023
4ba26a6
change
harshitha1201 Jun 27, 2023
3533074
Merge branch 'add-examples2' of https://github.com/harshitha1201/xarr…
harshitha1201 Jun 30, 2023
7bb884f
changes
harshitha1201 Jun 30, 2023
9421faa
indented
harshitha1201 Jul 2, 2023
74a94f9
indented
harshitha1201 Jul 3, 2023
1a728a2
minute_changes
harshitha1201 Jul 3, 2023
a06a8c8
doctest failure change
harshitha1201 Jul 3, 2023
1aa8c92
changes_
harshitha1201 Jul 3, 2023
7b6a2f6
change
harshitha1201 Jul 3, 2023
b00c1b3
Merge branch 'main' into add-examples2
harshitha1201 Jul 3, 2023
f0a9b54
change
harshitha1201 Jul 6, 2023
c168716
Merge branch 'add-examples2' of https://github.com/harshitha1201/xarr…
harshitha1201 Jul 6, 2023
c8999da
change
harshitha1201 Jul 7, 2023
4676ad9
Merge branch 'main' into add-examples2
harshitha1201 Jul 7, 2023
fd3f7ac
indented
harshitha1201 Jul 7, 2023
7882556
Merge branch 'add-examples2' of https://github.com/harshitha1201/xarr…
harshitha1201 Jul 7, 2023
11edc30
.
harshitha1201 Jul 7, 2023
e85f43c
what's new
harshitha1201 Jul 9, 2023
d683a82
.
harshitha1201 Jul 9, 2023
04f167f
head & tail
harshitha1201 Jul 9, 2023
13c4004
bfill & ffill
harshitha1201 Jul 9, 2023
718438c
dropna
harshitha1201 Jul 9, 2023
38e5885
head & tail
harshitha1201 Jul 9, 2023
f44312f
doctest
harshitha1201 Jul 9, 2023
5ef5aad
doctest error
harshitha1201 Jul 9, 2023
0f20bcd
Update xarray/core/dataset.py
harshitha1201 Jul 11, 2023
7d31547
Merge branch 'main' into add-examples2
harshitha1201 Jul 11, 2023
2028eac
.
harshitha1201 Jul 11, 2023
1b167c9
Merge branch 'add-examples2' of https://github.com/harshitha1201/xarr…
harshitha1201 Jul 11, 2023
4ecabe9
Fix doctest
TomNicholas Jul 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,9 @@ Bug fixes
Documentation
~~~~~~~~~~~~~

- Added examples to docstrings of :py:meth:`Dataset.tail`, :py:meth:`Dataset.head`, :py:meth:`Dataset.dropna`,
:py:meth:`Dataset.ffill`, :py:meth:`Dataset.bfill`, :py:meth:`Dataset.set_coords`, :py:meth:`Dataset.reset_coords`
(:issue:`6793`, :pull:`7936`) By `Harshitha <https://github.com/harshitha1201>`_ .
- Added page on wrapping chunked numpy-like arrays as alternatives to dask arrays.
(:pull:`7951`) By `Tom Nicholas <https://github.com/TomNicholas>`_.
- Expanded the page on wrapping numpy-like "duck" arrays.
Expand Down
315 changes: 313 additions & 2 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -1742,6 +1742,33 @@ def set_coords(self: T_Dataset, names: Hashable | Iterable[Hashable]) -> T_Datas
names : hashable or iterable of hashable
Name(s) of variables in this dataset to convert into coordinates.

Examples
--------
>>> dataset = xr.Dataset(
... {
... "pressure": ("time", [1.013, 1.2, 3.5]),
... "time": pd.date_range("2023-01-01", periods=3),
... }
... )
>>> dataset
<xarray.Dataset>
Dimensions: (time: 3)
Coordinates:
* time (time) datetime64[ns] 2023-01-01 2023-01-02 2023-01-03
Data variables:
pressure (time) float64 1.013 1.2 3.5

>>> dataset.set_coords("pressure")
<xarray.Dataset>
Dimensions: (time: 3)
Coordinates:
pressure (time) float64 1.013 1.2 3.5
* time (time) datetime64[ns] 2023-01-01 2023-01-02 2023-01-03
Data variables:
*empty*

On calling ``set_coords`` , these data variables are converted to coordinates, as shown in the final dataset.

Returns
-------
Dataset
Expand Down Expand Up @@ -1780,9 +1807,66 @@ def reset_coords(
If True, remove coordinates instead of converting them into
variables.

Examples
--------
>>> dataset = xr.Dataset(
... {
... "temperature": (
... ["time", "lat", "lon"],
... [[[25, 26], [27, 28]], [[29, 30], [31, 32]]],
... ),
... "precipitation": (
... ["time", "lat", "lon"],
... [[[0.5, 0.8], [0.2, 0.4]], [[0.3, 0.6], [0.7, 0.9]]],
... ),
... },
... coords={
... "time": pd.date_range(start="2023-01-01", periods=2),
... "lat": [40, 41],
... "lon": [-80, -79],
... "altitude": 1000,
... },
... )

# Dataset before resetting coordinates

>>> dataset
<xarray.Dataset>
Dimensions: (time: 2, lat: 2, lon: 2)
Coordinates:
* time (time) datetime64[ns] 2023-01-01 2023-01-02
* lat (lat) int64 40 41
* lon (lon) int64 -80 -79
altitude int64 1000
Data variables:
temperature (time, lat, lon) int64 25 26 27 28 29 30 31 32
precipitation (time, lat, lon) float64 0.5 0.8 0.2 0.4 0.3 0.6 0.7 0.9

# Reset the 'altitude' coordinate

>>> dataset_reset = dataset.reset_coords("altitude")

# Dataset after resetting coordinates

>>> dataset_reset
<xarray.Dataset>
Dimensions: (time: 2, lat: 2, lon: 2)
Coordinates:
* time (time) datetime64[ns] 2023-01-01 2023-01-02
* lat (lat) int64 40 41
* lon (lon) int64 -80 -79
Data variables:
temperature (time, lat, lon) int64 25 26 27 28 29 30 31 32
precipitation (time, lat, lon) float64 0.5 0.8 0.2 0.4 0.3 0.6 0.7 0.9
altitude int64 1000

Returns
-------
Dataset

See Also
--------
Dataset.set_coords
"""
if names is None:
names = self._coord_names - set(self._indexes)
Expand Down Expand Up @@ -2742,6 +2826,50 @@ def head(
The keyword arguments form of ``indexers``.
One of indexers or indexers_kwargs must be provided.

Examples
--------
>>> dates = pd.date_range(start="2023-01-01", periods=5)
>>> pageviews = [1200, 1500, 900, 1800, 2000]
>>> visitors = [800, 1000, 600, 1200, 1500]
>>> dataset = xr.Dataset(
... {
... "pageviews": (("date"), pageviews),
... "visitors": (("date"), visitors),
... },
... coords={"date": dates},
... )
>>> busiest_days = dataset.sortby("pageviews", ascending=False)
>>> busiest_days.head()
<xarray.Dataset>
Dimensions: (date: 5)
Coordinates:
* date (date) datetime64[ns] 2023-01-05 2023-01-04 ... 2023-01-03
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved
Data variables:
pageviews (date) int64 2000 1800 1500 1200 900
visitors (date) int64 1500 1200 1000 800 600

# Retrieve the 3 most busiest days in terms of pageviews

>>> busiest_days.head(3)
harshitha1201 marked this conversation as resolved.
Show resolved Hide resolved
<xarray.Dataset>
Dimensions: (date: 3)
Coordinates:
* date (date) datetime64[ns] 2023-01-05 2023-01-04 2023-01-02
Data variables:
pageviews (date) int64 2000 1800 1500
visitors (date) int64 1500 1200 1000

# Using a dictionary to specify the number of elements for specific dimensions

>>> busiest_days.head({"date": 3})
<xarray.Dataset>
Dimensions: (date: 3)
Coordinates:
* date (date) datetime64[ns] 2023-01-05 2023-01-04 2023-01-02
Data variables:
pageviews (date) int64 2000 1800 1500
visitors (date) int64 1500 1200 1000

See Also
--------
Dataset.tail
Expand Down Expand Up @@ -2788,6 +2916,48 @@ def tail(
The keyword arguments form of ``indexers``.
One of indexers or indexers_kwargs must be provided.

Examples
harshitha1201 marked this conversation as resolved.
Show resolved Hide resolved
--------
harshitha1201 marked this conversation as resolved.
Show resolved Hide resolved
>>> activity_names = ["Walking", "Running", "Cycling", "Swimming", "Yoga"]
>>> durations = [30, 45, 60, 45, 60] # in minutes
>>> energies = [150, 300, 250, 400, 100] # in calories
>>> dataset = xr.Dataset(
... {
... "duration": (["activity"], durations),
... "energy_expenditure": (["activity"], energies),
... },
... coords={"activity": activity_names},
... )
>>> sorted_dataset = dataset.sortby("energy_expenditure", ascending=False)
harshitha1201 marked this conversation as resolved.
Show resolved Hide resolved
>>> sorted_dataset
<xarray.Dataset>
Dimensions: (activity: 5)
Coordinates:
* activity (activity) <U8 'Swimming' 'Running' ... 'Walking' 'Yoga'
Data variables:
duration (activity) int64 45 45 60 30 60
energy_expenditure (activity) int64 400 300 250 150 100

# Activities with the least energy expenditures using tail()

>>> sorted_dataset.tail(3)
<xarray.Dataset>
Dimensions: (activity: 3)
Coordinates:
* activity (activity) <U8 'Cycling' 'Walking' 'Yoga'
Data variables:
duration (activity) int64 60 30 60
energy_expenditure (activity) int64 250 150 100

>>> sorted_dataset.tail({"activity": 3})
<xarray.Dataset>
Dimensions: (activity: 3)
Coordinates:
* activity (activity) <U8 'Cycling' 'Walking' 'Yoga'
Data variables:
duration (activity) int64 60 30 60
energy_expenditure (activity) int64 250 150 100

See Also
--------
Dataset.head
Expand Down Expand Up @@ -5617,6 +5787,70 @@ def dropna(
Which variables to check for missing values. By default, all
variables in the dataset are checked.

Examples
--------
>>> dataset = xr.Dataset(
... {
... "temperature": (
... ["time", "location"],
... [[23.4, 24.1], [np.nan, 22.1], [21.8, 24.2], [20.5, 25.3]],
... )
... },
... coords={"time": [1, 2, 3, 4], "location": ["A", "B"]},
... )
>>> dataset
<xarray.Dataset>
Dimensions: (time: 4, location: 2)
Coordinates:
* time (time) int64 1 2 3 4
* location (location) <U1 'A' 'B'
Data variables:
temperature (time, location) float64 23.4 24.1 nan 22.1 21.8 24.2 20.5 25.3

# Drop NaN values from the dataset
harshitha1201 marked this conversation as resolved.
Show resolved Hide resolved

>>> dataset.dropna(dim="time")
<xarray.Dataset>
Dimensions: (time: 3, location: 2)
Coordinates:
* time (time) int64 1 3 4
* location (location) <U1 'A' 'B'
Data variables:
temperature (time, location) float64 23.4 24.1 21.8 24.2 20.5 25.3

# Drop labels with any NAN values

>>> dataset.dropna(dim="time", how="any")
<xarray.Dataset>
Dimensions: (time: 3, location: 2)
Coordinates:
* time (time) int64 1 3 4
* location (location) <U1 'A' 'B'
Data variables:
temperature (time, location) float64 23.4 24.1 21.8 24.2 20.5 25.3

# Drop labels with all NAN values

>>> dataset.dropna(dim="time", how="all")
<xarray.Dataset>
Dimensions: (time: 4, location: 2)
Coordinates:
* time (time) int64 1 2 3 4
* location (location) <U1 'A' 'B'
Data variables:
temperature (time, location) float64 23.4 24.1 nan 22.1 21.8 24.2 20.5 25.3

# Drop labels with less than 2 non-NA values

>>> dataset.dropna(dim="time", thresh=2)
<xarray.Dataset>
Dimensions: (time: 3, location: 2)
Coordinates:
* time (time) int64 1 3 4
* location (location) <U1 'A' 'B'
Data variables:
temperature (time, location) float64 23.4 24.1 21.8 24.2 20.5 25.3

Returns
-------
Dataset
Expand Down Expand Up @@ -5877,18 +6111,56 @@ def ffill(self: T_Dataset, dim: Hashable, limit: int | None = None) -> T_Dataset
Parameters
----------
dim : Hashable
Specifies the dimension along which to propagate values when
filling.
Specifies the dimension along which to propagate values when filling.
limit : int or None, optional
The maximum number of consecutive NaN values to forward fill. In
other words, if there is a gap with more than this number of
consecutive NaNs, it will only be partially filled. Must be greater
than 0 or None for no limit. Must be None or greater than or equal
to axis length if filling along chunked axes (dimensions).

Examples
harshitha1201 marked this conversation as resolved.
Show resolved Hide resolved
--------
>>> time = pd.date_range("2023-01-01", periods=10, freq="D")
>>> data = np.array(
... [1, np.nan, np.nan, np.nan, 5, np.nan, np.nan, 8, np.nan, 10]
... )
>>> dataset = xr.Dataset({"data": (("time",), data)}, coords={"time": time})
>>> dataset
<xarray.Dataset>
Dimensions: (time: 10)
Coordinates:
* time (time) datetime64[ns] 2023-01-01 2023-01-02 ... 2023-01-10
Data variables:
data (time) float64 1.0 nan nan nan 5.0 nan nan 8.0 nan 10.0

# Perform forward fill (ffill) on the dataset

>>> dataset.ffill(dim="time")
<xarray.Dataset>
Dimensions: (time: 10)
Coordinates:
* time (time) datetime64[ns] 2023-01-01 2023-01-02 ... 2023-01-10
Data variables:
data (time) float64 1.0 1.0 1.0 1.0 5.0 5.0 5.0 8.0 8.0 10.0

# Limit the forward filling to a maximum of 2 consecutive NaN values

>>> dataset.ffill(dim="time", limit=2)
<xarray.Dataset>
Dimensions: (time: 10)
Coordinates:
* time (time) datetime64[ns] 2023-01-01 2023-01-02 ... 2023-01-10
Data variables:
data (time) float64 1.0 1.0 1.0 nan 5.0 5.0 5.0 8.0 8.0 10.0

Returns
-------
Dataset

See Also
--------
Dataset.bfill
"""
from xarray.core.missing import _apply_over_vars_with_dim, ffill

Expand All @@ -5912,9 +6184,48 @@ def bfill(self: T_Dataset, dim: Hashable, limit: int | None = None) -> T_Dataset
than 0 or None for no limit. Must be None or greater than or equal
to axis length if filling along chunked axes (dimensions).

Examples
harshitha1201 marked this conversation as resolved.
Show resolved Hide resolved
--------
>>> time = pd.date_range("2023-01-01", periods=10, freq="D")
>>> data = np.array(
... [1, np.nan, np.nan, np.nan, 5, np.nan, np.nan, 8, np.nan, 10]
... )
>>> dataset = xr.Dataset({"data": (("time",), data)}, coords={"time": time})
>>> dataset
<xarray.Dataset>
Dimensions: (time: 10)
Coordinates:
* time (time) datetime64[ns] 2023-01-01 2023-01-02 ... 2023-01-10
Data variables:
data (time) float64 1.0 nan nan nan 5.0 nan nan 8.0 nan 10.0

# filled dataset, fills NaN values by propagating values backward

>>> dataset.bfill(dim="time")
<xarray.Dataset>
Dimensions: (time: 10)
Coordinates:
* time (time) datetime64[ns] 2023-01-01 2023-01-02 ... 2023-01-10
Data variables:
data (time) float64 1.0 5.0 5.0 5.0 5.0 8.0 8.0 8.0 10.0 10.0

# Limit the backward filling to a maximum of 2 consecutive NaN values

>>> dataset.bfill(dim="time", limit=2)
<xarray.Dataset>
Dimensions: (time: 10)
Coordinates:
* time (time) datetime64[ns] 2023-01-01 2023-01-02 ... 2023-01-10
Data variables:
data (time) float64 1.0 nan 5.0 5.0 5.0 8.0 8.0 8.0 10.0 10.0

Returns
-------
Dataset

See Also
--------
Dataset.ffill
"""
from xarray.core.missing import _apply_over_vars_with_dim, bfill

Expand Down