Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for CFTimeIndex in get_clean_interp_index #3631

Merged
merged 39 commits into from
Jan 26, 2020
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
d5c2242
add support for CFTimeIndex in get_clean_interp_index
huard Dec 16, 2019
77bb24c
black
huard Dec 16, 2019
03e7769
added test comparing cftime index with standard index
huard Dec 16, 2019
e169cf4
added comment
huard Dec 16, 2019
303020f
index in ns instead of days
huard Dec 16, 2019
210fb94
pep8
huard Dec 16, 2019
1cfe72d
datetime_to_numeric: convert timedelta objects using np.timedelta64 t…
huard Dec 18, 2019
4964163
added interp test
huard Dec 18, 2019
83f6c89
switched clean_interp_index resolution to us. Fixed interpolate_na an…
huard Dec 18, 2019
6298953
Error message to explain overflow problem.
huard Dec 18, 2019
3d23ccf
Merge branch 'fix-3641' into cf_interp_index
huard Dec 19, 2019
2ba1803
switched timedelta64 units from ms to us
huard Dec 20, 2019
9a648d9
Merge branch 'fix-3641' into cf_interp_index
huard Dec 20, 2019
e873da2
reverted default user-visible resolution to ns. Converts to float, po…
huard Jan 6, 2020
532756d
pep8
huard Jan 6, 2020
73d8729
black
huard Jan 6, 2020
4288780
special case for older numpy versions
huard Jan 6, 2020
077145e
black
huard Jan 6, 2020
758d81c
added xfail for overflow error with numpy < 1.17
huard Jan 6, 2020
d0d8bfe
changes following PR comments from spencerclark
huard Jan 14, 2020
6c9630a
bypass pandas to convert timedeltas to floats. avoids overflow errors.
huard Jan 17, 2020
d18c775
black
huard Jan 17, 2020
78e17ec
Merge branch 'master' into cf_interp_index
huard Jan 17, 2020
6615c97
removed numpy conversion. added docstrings. renamed tests.
huard Jan 20, 2020
2df2b29
pep8
huard Jan 20, 2020
31f5417
updated whats new
huard Jan 20, 2020
2974af9
Update doc/whats-new.rst
huard Jan 20, 2020
eeb5074
update interpolate_na docstrings
huard Jan 20, 2020
6b9631f
black
huard Jan 20, 2020
5656fdb
dt conflicts with accessor
huard Jan 20, 2020
dcf98ff
replaced assert_equal by assert_allclose
huard Jan 24, 2020
4842a96
Update xarray/core/duck_array_ops.py
huard Jan 25, 2020
6dbf225
Update xarray/core/duck_array_ops.py
huard Jan 25, 2020
c90dc97
renamed array to value in timedelta_to_numeric. Added tests
huard Jan 25, 2020
71fb87d
removed support for TimedeltaIndex in timedelta_to_numeric
huard Jan 25, 2020
3d9f333
added tests for np_timedelta64_to_float and pd_timedelta_to_float. re…
huard Jan 26, 2020
b04785c
black
huard Jan 26, 2020
d24cae4
Fix flake8 error
spencerkclark Jan 26, 2020
6f0c504
black
spencerkclark Jan 26, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 9 additions & 4 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ Breaking changes
~~~~~~~~~~~~~~~~

- Remove ``compat`` and ``encoding`` kwargs from ``DataArray``, which
have been deprecated since 0.12. (:pull:`3650`).
Instead, specify the encoding when writing to disk or set
have been deprecated since 0.12. (:pull:`3650`).
Instead, specify the encoding when writing to disk or set
the ``encoding`` attribute directly.
By `Maximilian Roos <https://github.com/max-sixty>`_

Expand All @@ -48,10 +48,15 @@ New Features
- :py:meth:`Dataset.swap_dims` and :py:meth:`DataArray.swap_dims`
now allow swapping to dimension names that don't exist yet. (:pull:`3636`)
By `Justus Magin <https://github.com/keewis>`_.
- Extend :py:class:`core.accessor_dt.DatetimeAccessor` properties
and support `.dt` accessor for timedelta
- Extend :py:class:`core.accessor_dt.DatetimeAccessor` properties
and support `.dt` accessor for timedelta
via :py:class:`core.accessor_dt.TimedeltaAccessor` (:pull:`3612`)
By `Anderson Banihirwe <https://github.com/andersy005>`_.
- Support CFTimeIndex in :py:meth:`DataArray.interpolate_na`, define 1970-01-01
as the default offset for the interpolation index for both DatetimeIndex and
CFTimeIndex, use microseconds in the conversion from timedelta objects
to floats to avoid overflow errors (:issue:`3641`, :pull:`3631`).
By David Huard `<https://github.com/huard>`_.

Bug fixes
~~~~~~~~~
Expand Down
9 changes: 8 additions & 1 deletion xarray/coding/cftimeindex.py
Original file line number Diff line number Diff line change
Expand Up @@ -430,7 +430,14 @@ def __sub__(self, other):
import cftime

if isinstance(other, (CFTimeIndex, cftime.datetime)):
return pd.TimedeltaIndex(np.array(self) - np.array(other))
try:
return pd.TimedeltaIndex(np.array(self) - np.array(other))
except OverflowError:
raise ValueError(
huard marked this conversation as resolved.
Show resolved Hide resolved
"The time difference exceeds the range of values "
"that can be expressed at the nanosecond resolution."
)

elif isinstance(other, pd.TimedeltaIndex):
return CFTimeIndex(np.array(self) - other.to_pytimedelta())
else:
Expand Down
8 changes: 6 additions & 2 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
cast,
)

import datetime
import numpy as np
import pandas as pd

Expand Down Expand Up @@ -2041,7 +2042,9 @@ def interpolate_na(
method: str = "linear",
limit: int = None,
use_coordinate: Union[bool, str] = True,
max_gap: Union[int, float, str, pd.Timedelta, np.timedelta64] = None,
max_gap: Union[
int, float, str, pd.Timedelta, np.timedelta64, datetime.timedelta
] = None,
**kwargs: Any,
) -> "DataArray":
"""Fill in NaNs by interpolating according to different methods.
Expand Down Expand Up @@ -2073,14 +2076,15 @@ def interpolate_na(
or None for no limit. This filling is done regardless of the size of
the gap in the data. To only interpolate over gaps less than a given length,
see ``max_gap``.
max_gap: int, float, str, pandas.Timedelta, numpy.timedelta64, default None.
max_gap: int, float, str, pandas.Timedelta, numpy.timedelta64, datetime.timedelta, default None.
Maximum size of gap, a continuous sequence of NaNs, that will be filled.
Use None for no limit. When interpolating along a datetime64 dimension
and ``use_coordinate=True``, ``max_gap`` can be one of the following:

- a string that is valid input for pandas.to_timedelta
- a :py:class:`numpy.timedelta64` object
- a :py:class:`pandas.Timedelta` object
- a :py:class:`datetime.timedelta` object

Otherwise, ``max_gap`` must be an int or a float. Use of ``max_gap`` with unlabeled
dimensions has not been implemented yet. Gap length is defined as the difference
Expand Down
8 changes: 6 additions & 2 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
cast,
)

import datetime
import numpy as np
import pandas as pd

Expand Down Expand Up @@ -3994,7 +3995,9 @@ def interpolate_na(
method: str = "linear",
limit: int = None,
use_coordinate: Union[bool, Hashable] = True,
max_gap: Union[int, float, str, pd.Timedelta, np.timedelta64] = None,
max_gap: Union[
int, float, str, pd.Timedelta, np.timedelta64, datetime.timedelta
] = None,
**kwargs: Any,
) -> "Dataset":
"""Fill in NaNs by interpolating according to different methods.
Expand Down Expand Up @@ -4027,14 +4030,15 @@ def interpolate_na(
or None for no limit. This filling is done regardless of the size of
the gap in the data. To only interpolate over gaps less than a given length,
see ``max_gap``.
max_gap: int, float, str, pandas.Timedelta, numpy.timedelta64, default None.
max_gap: int, float, str, pandas.Timedelta, numpy.timedelta64, datetime.timedelta, default None.
Maximum size of gap, a continuous sequence of NaNs, that will be filled.
Use None for no limit. When interpolating along a datetime64 dimension
and ``use_coordinate=True``, ``max_gap`` can be one of the following:

- a string that is valid input for pandas.to_timedelta
- a :py:class:`numpy.timedelta64` object
- a :py:class:`pandas.Timedelta` object
- a :py:class:`datetime.timedelta` object

Otherwise, ``max_gap`` must be an int or a float. Use of ``max_gap`` with unlabeled
dimensions has not been implemented yet. Gap length is defined as the difference
Expand Down
127 changes: 111 additions & 16 deletions xarray/core/duck_array_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -372,51 +372,146 @@ def _datetime_nanmin(array):


def datetime_to_numeric(array, offset=None, datetime_unit=None, dtype=float):
"""Convert an array containing datetime-like data to an array of floats.
"""Convert an array containing datetime-like data to numerical values.

Convert the datetime array to a timedelta relative to an offset.

Parameters
----------
da : np.array
Input data
offset: Scalar with the same type of array or None
If None, subtract minimum values to reduce round off error
datetime_unit: None or any of {'Y', 'M', 'W', 'D', 'h', 'm', 's', 'ms',
'us', 'ns', 'ps', 'fs', 'as'}
dtype: target dtype
da : array-like
Input data
offset: None, datetime or cftime.datetime
Datetime offset. If None, this is set by default to the array's minimum
value to reduce round off errors.
datetime_unit: {None, Y, M, W, D, h, m, s, ms, us, ns, ps, fs, as}
If not None, convert output to a given datetime unit. Note that some
conversions are not allowed due to non-linear relationships between units.
dtype: dtype
Output dtype.

Returns
-------
array
Numerical representation of datetime object relative to an offset.

Notes
-----
Some datetime unit conversions won't work, for example from days to years, even
though some calendars would allow for them (e.g. no_leap). This is because there
is no `cftime.timedelta` object.
"""
# TODO: make this function dask-compatible?
# Set offset to minimum if not given
if offset is None:
if array.dtype.kind in "Mm":
offset = _datetime_nanmin(array)
else:
offset = min(array)

# Compute timedelta object.
# For np.datetime64, this can silently yield garbage due to overflow.
# One option is to enforce 1970-01-01 as the universal offset.
huard marked this conversation as resolved.
Show resolved Hide resolved
array = array - offset

if not hasattr(array, "dtype"): # scalar is converted to 0d-array
# Scalar is converted to 0d-array
if not hasattr(array, "dtype"):
array = np.array(array)

# Convert timedelta objects to float by first converting to microseconds.
if array.dtype.kind in "O":
# possibly convert object array containing datetime.timedelta
array = np.asarray(pd.Series(array.ravel())).reshape(array.shape)
return py_timedelta_to_float(array, datetime_unit or "ns").astype(dtype)

if datetime_unit:
array = array / np.timedelta64(1, datetime_unit)
# Convert np.NaT to np.nan
elif array.dtype.kind in "mM":

# convert np.NaT to np.nan
if array.dtype.kind in "mM":
# Convert to specified timedelta units.
if datetime_unit:
array = array / np.timedelta64(1, datetime_unit)
return np.where(isnull(array), np.nan, array.astype(dtype))
return array.astype(dtype)


def timedelta_to_numeric(value, datetime_unit="ns", dtype=float):
"""Convert a timedelta-like object to numerical values.

Parameters
----------
value : datetime.timedelta, numpy.timedelta64, pandas.Timedelta, str
Time delta representation.
datetime_unit : {Y, M, W, D, h, m, s, ms, us, ns, ps, fs, as}
The time units of the output values. Note that some conversions are not allowed due to
non-linear relationships between units.
dtype : type
The output data type.

"""
import datetime as dt

if isinstance(value, dt.timedelta):
out = py_timedelta_to_float(value, datetime_unit)
elif isinstance(value, np.timedelta64):
out = np_timedelta64_to_float(value, datetime_unit)
elif isinstance(value, pd.Timedelta):
out = pd_timedelta_to_float(value, datetime_unit)
elif isinstance(value, str):
try:
a = pd.to_timedelta(value)
except ValueError:
raise ValueError(
f"Could not convert {value!r} to timedelta64 using pandas.to_timedelta"
)
return py_timedelta_to_float(a, datetime_unit)
else:
raise TypeError(
f"Expected value of type str, pandas.Timedelta, datetime.timedelta "
f"or numpy.timedelta64, but received {type(value).__name__}"
)
return out.astype(dtype)


def _to_pytimedelta(array, unit="us"):
index = pd.TimedeltaIndex(array.ravel(), unit=unit)
return index.to_pytimedelta().reshape(array.shape)


def np_timedelta64_to_float(array, datetime_unit):
huard marked this conversation as resolved.
Show resolved Hide resolved
"""Convert numpy.timedelta64 to float.

Notes
-----
The array is first converted to microseconds, which is less likely to
cause overflow errors.
"""
array = array.astype("timedelta64[ns]").astype(np.float64)
conversion_factor = np.timedelta64(1, "ns") / np.timedelta64(1, datetime_unit)
return conversion_factor * array


def pd_timedelta_to_float(array, datetime_units):
huard marked this conversation as resolved.
Show resolved Hide resolved
"""Convert pandas.Timedelta to float.

Notes
-----
Built on the assumption that pandas timedelta values are in nanoseconds,
which is also the numpy default resolution.
"""
array = array.to_timedelta64()
return np_timedelta64_to_float(array, datetime_units)


def pd_timedeltaindex_to_float(array, datetime_units):
huard marked this conversation as resolved.
Show resolved Hide resolved
"""Convert pandas.TimedeltaIndex to float."""
return np_timedelta64_to_float(array.values, datetime_units)


def py_timedelta_to_float(array, datetime_unit):
"""Convert a timedelta object to a float, possibly at a loss of resolution.
"""
array = np.asarray(array)
array = np.reshape([a.total_seconds() for a in array.ravel()], array.shape) * 1e6
conversion_factor = np.timedelta64(1, "us") / np.timedelta64(1, datetime_unit)
return conversion_factor * array


def mean(array, axis=None, skipna=None, **kwargs):
"""inhouse mean that can handle np.datetime64 or cftime.datetime
dtypes"""
Expand Down
Loading