Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Remove CalendarDay #24330

Merged
merged 9 commits into from
Dec 18, 2018
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 2 additions & 23 deletions doc/source/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -408,7 +408,7 @@ In practice this becomes very cumbersome because we often need a very long
index with a large number of timestamps. If we need timestamps on a regular
frequency, we can use the :func:`date_range` and :func:`bdate_range` functions
to create a ``DatetimeIndex``. The default frequency for ``date_range`` is a
**day** while the default for ``bdate_range`` is a **business day**:
**calendar day** while the default for ``bdate_range`` is a **business day**:

.. ipython:: python

Expand Down Expand Up @@ -927,26 +927,6 @@ in the operation).

.. _relativedelta documentation: https://dateutil.readthedocs.io/en/stable/relativedelta.html

.. _timeseries.dayvscalendarday:

Day vs. CalendarDay
~~~~~~~~~~~~~~~~~~~

:class:`Day` (``'D'``) is a timedelta-like offset that respects absolute time
arithmetic and is an alias for 24 :class:`Hour`. This offset is the default
argument to many pandas time related function like :func:`date_range` and :func:`timedelta_range`.

:class:`CalendarDay` (``'CD'``) is a relativedelta-like offset that respects
calendar time arithmetic. :class:`CalendarDay` is useful preserving calendar day
semantics with date times with have day light savings transitions, i.e. :class:`CalendarDay`
will preserve the hour before the day light savings transition.

.. ipython:: python

ts = pd.Timestamp('2016-10-30 00:00:00', tz='Europe/Helsinki')
ts + pd.offsets.Day(1)
ts + pd.offsets.CalendarDay(1)


Parametric Offsets
~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -1243,8 +1223,7 @@ frequencies. We will refer to these aliases as *offset aliases*.

"B", "business day frequency"
"C", "custom business day frequency"
"D", "day frequency"
"CD", "calendar day frequency"
"D", "calendar day frequency"
"W", "weekly frequency"
"M", "month end frequency"
"SM", "semi-month end frequency (15th and end of month)"
Expand Down
42 changes: 0 additions & 42 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -591,48 +591,6 @@ that the dates have been converted to UTC
pd.to_datetime(["2015-11-18 15:30:00+05:30",
"2015-11-18 16:30:00+06:30"], utc=True)

.. _whatsnew_0240.api_breaking.calendarday:

CalendarDay Offset
^^^^^^^^^^^^^^^^^^

:class:`Day` and associated frequency alias ``'D'`` were documented to represent
a calendar day; however, arithmetic and operations with :class:`Day` sometimes
respected absolute time instead (i.e. ``Day(n)`` and acted identically to ``Timedelta(days=n)``).

*Previous Behavior*:

.. code-block:: ipython


In [2]: ts = pd.Timestamp('2016-10-30 00:00:00', tz='Europe/Helsinki')

# Respects calendar arithmetic
In [3]: pd.date_range(start=ts, freq='D', periods=3)
Out[3]:
DatetimeIndex(['2016-10-30 00:00:00+03:00', '2016-10-31 00:00:00+02:00',
'2016-11-01 00:00:00+02:00'],
dtype='datetime64[ns, Europe/Helsinki]', freq='D')

# Respects absolute arithmetic
In [4]: ts + pd.tseries.frequencies.to_offset('D')
Out[4]: Timestamp('2016-10-30 23:00:00+0200', tz='Europe/Helsinki')

*New Behavior*:

:class:`CalendarDay` and associated frequency alias ``'CD'`` are now available
and respect calendar day arithmetic while :class:`Day` and frequency alias ``'D'``
will now respect absolute time (:issue:`22274`, :issue:`20596`, :issue:`16980`, :issue:`8774`)
See the :ref:`documentation here <timeseries.dayvscalendarday>` for more information.

Addition with :class:`CalendarDay` across a daylight savings time transition:

.. ipython:: python

ts = pd.Timestamp('2016-10-30 00:00:00', tz='Europe/Helsinki')
ts + pd.offsets.Day(1)
ts + pd.offsets.CalendarDay(1)

.. _whatsnew_0240.api_breaking.period_end_time:

Time values in ``dt.end_time`` and ``to_timestamp(how='end')``
Expand Down
28 changes: 14 additions & 14 deletions pandas/core/arrays/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
import pandas.core.common as com

from pandas.tseries.frequencies import get_period_alias, to_offset
from pandas.tseries.offsets import Tick, generate_range
from pandas.tseries.offsets import Day, Tick, generate_range

_midnight = time(0, 0)

Expand Down Expand Up @@ -255,7 +255,8 @@ def _from_sequence(cls, data, dtype=None, copy=False,

@classmethod
def _generate_range(cls, start, end, periods, freq, tz=None,
normalize=False, ambiguous='raise', closed=None):
normalize=False, ambiguous='raise',
nonexistent='raise', closed=None):

periods = dtl.validate_periods(periods)
if freq is None and any(x is None for x in [periods, start, end]):
Expand Down Expand Up @@ -285,7 +286,7 @@ def _generate_range(cls, start, end, periods, freq, tz=None,
start, end, _normalized = _maybe_normalize_endpoints(start, end,
normalize)

tz, _ = _infer_tz_from_endpoints(start, end, tz)
tz = _infer_tz_from_endpoints(start, end, tz)

if tz is not None:
# Localize the start and end arguments
Expand All @@ -295,22 +296,22 @@ def _generate_range(cls, start, end, periods, freq, tz=None,
end = _maybe_localize_point(
end, getattr(end, 'tz', None), end, freq, tz
)
if start and end:
# Make sure start and end have the same tz
start = _maybe_localize_point(
start, start.tz, end.tz, freq, tz
)
end = _maybe_localize_point(
end, end.tz, start.tz, freq, tz
)
if freq is not None:
# We break Day arithmetic (fixed 24 hour) here and opt for
# Day to mean calendar day (23/24/25 hour). Therefore, strip
# tz info from start and day to avoid DST arithmetic
if isinstance(freq, Day):
if start is not None:
start = start.tz_localize(None)
if end is not None:
end = end.tz_localize(None)
# TODO: consider re-implementing _cached_range; GH#17914
index = _generate_regular_range(cls, start, end, periods, freq)

if tz is not None and index.tz is None:
arr = conversion.tz_localize_to_utc(
index.asi8,
tz, ambiguous=ambiguous)
tz, ambiguous=ambiguous, nonexistent=nonexistent)

index = cls(arr)

Expand Down Expand Up @@ -1878,7 +1879,6 @@ def _infer_tz_from_endpoints(start, end, tz):
Returns
-------
tz : tzinfo or None
inferred_tz : tzinfo or None
Raises
------
Expand All @@ -1901,7 +1901,7 @@ def _infer_tz_from_endpoints(start, end, tz):
elif inferred_tz is not None:
tz = inferred_tz

return tz, inferred_tz
return tz


def _maybe_normalize_endpoints(start, end, normalize):
Expand Down
21 changes: 13 additions & 8 deletions pandas/core/resample.py
Original file line number Diff line number Diff line change
Expand Up @@ -1403,7 +1403,9 @@ def _get_time_bins(self, ax):
start=first,
end=last,
tz=tz,
name=ax.name)
name=ax.name,
ambiguous='infer',
nonexistent='shift')

# GH 15549
# In edge case of tz-aware resapmling binner last index can be
Expand Down Expand Up @@ -1607,7 +1609,7 @@ def _get_timestamp_range_edges(first, last, offset, closed='left', base=0):
Adjust the `first` Timestamp to the preceeding Timestamp that resides on
the provided offset. Adjust the `last` Timestamp to the following
Timestamp that resides on the provided offset. Input Timestamps that
already reside on the offset will be adjusted depeding on the type of
already reside on the offset will be adjusted depending on the type of
offset and the `closed` parameter.
Parameters
Expand All @@ -1627,18 +1629,21 @@ def _get_timestamp_range_edges(first, last, offset, closed='left', base=0):
-------
A tuple of length 2, containing the adjusted pd.Timestamp objects.
"""
if not all(isinstance(obj, pd.Timestamp) for obj in [first, last]):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed AFAICT. All internal calls to this method have Timestamp inputs.

raise TypeError("'first' and 'last' must be instances of type "
"Timestamp")

if isinstance(offset, Tick):
is_day = isinstance(offset, Day)
day_nanos = delta_to_nanoseconds(timedelta(1))

# #1165 and #24127
if (is_day and not offset.nanos % day_nanos) or not is_day:
return _adjust_dates_anchored(first, last, offset,
closed=closed, base=base)
first, last = _adjust_dates_anchored(first, last, offset,
closed=closed, base=base)
if is_day and first.tz is not None:
# _adjust_dates_anchored assumes 'D' means 24H, but first/last
# might contain a DST transition (23H, 24H, or 25H).
# Ensure first/last snap to midnight.
first = first.normalize()
last = last.normalize()
return first, last

else:
first = first.normalize()
Expand Down
17 changes: 4 additions & 13 deletions pandas/tests/indexes/datetimes/test_date_range.py
Original file line number Diff line number Diff line change
Expand Up @@ -359,18 +359,18 @@ def test_range_tz_pytz(self):
Timestamp(datetime(2013, 11, 6), tz='US/Eastern')]
])
def test_range_tz_dst_straddle_pytz(self, start, end):
dr = date_range(start, end, freq='CD')
dr = date_range(start, end, freq='D')
assert dr[0] == start
assert dr[-1] == end
assert np.all(dr.hour == 0)

dr = date_range(start, end, freq='CD', tz='US/Eastern')
dr = date_range(start, end, freq='D', tz='US/Eastern')
assert dr[0] == start
assert dr[-1] == end
assert np.all(dr.hour == 0)

dr = date_range(start.replace(tzinfo=None), end.replace(
tzinfo=None), freq='CD', tz='US/Eastern')
tzinfo=None), freq='D', tz='US/Eastern')
assert dr[0] == start
assert dr[-1] == end
assert np.all(dr.hour == 0)
Expand Down Expand Up @@ -604,14 +604,6 @@ def test_mismatching_tz_raises_err(self, start, end):
with pytest.raises(TypeError):
pd.date_range(start, end, freq=BDay())

def test_CalendarDay_range_with_dst_crossing(self):
# GH 20596
result = date_range('2018-10-23', '2018-11-06', freq='7CD',
tz='Europe/Paris')
expected = date_range('2018-10-23', '2018-11-06',
freq=pd.DateOffset(days=7), tz='Europe/Paris')
tm.assert_index_equal(result, expected)


class TestBusinessDateRange(object):

Expand Down Expand Up @@ -766,8 +758,7 @@ def test_cdaterange_weekmask_and_holidays(self):
holidays=['2013-05-01'])

@pytest.mark.parametrize('freq', [freq for freq in prefix_mapping
if freq.startswith('C')
and freq != 'CD']) # CalendarDay
if freq.startswith('C')])
def test_all_custom_freq(self, freq):
# should not raise
bdate_range(START, END, freq=freq, weekmask='Mon Wed Fri',
Expand Down
4 changes: 2 additions & 2 deletions pandas/tests/indexes/datetimes/test_timezones.py
Original file line number Diff line number Diff line change
Expand Up @@ -436,7 +436,7 @@ def test_dti_tz_localize_utc_conversion(self, tz):

@pytest.mark.parametrize('idx', [
date_range(start='2014-01-01', end='2014-12-31', freq='M'),
date_range(start='2014-01-01', end='2014-12-31', freq='CD'),
date_range(start='2014-01-01', end='2014-12-31', freq='D'),
date_range(start='2014-01-01', end='2014-03-01', freq='H'),
date_range(start='2014-08-01', end='2014-10-31', freq='T')
])
Expand Down Expand Up @@ -1072,7 +1072,7 @@ def test_date_range_span_dst_transition(self, tzstr):

dr = date_range('2012-11-02', periods=10, tz=tzstr)
result = dr.hour
expected = Index([0, 0, 0, 23, 23, 23, 23, 23, 23, 23])
expected = Index([0] * 10)
tm.assert_index_equal(result, expected)

@pytest.mark.parametrize('tzstr', ['US/Eastern', 'dateutil/US/Eastern'])
Expand Down
4 changes: 0 additions & 4 deletions pandas/tests/indexes/timedeltas/test_timedelta_range.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,6 @@ def test_timedelta_range(self):
result = df.loc['0s':, :]
tm.assert_frame_equal(expected, result)

with pytest.raises(ValueError):
# GH 22274: CalendarDay is a relative time measurement
timedelta_range('1day', freq='CD', periods=2)

@pytest.mark.parametrize('periods, freq', [
(3, '2D'), (5, 'D'), (6, '19H12T'), (7, '16H'), (9, '12H')])
def test_linspace_behavior(self, periods, freq):
Expand Down
8 changes: 4 additions & 4 deletions pandas/tests/resample/test_datetime_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -1279,7 +1279,7 @@ def test_resample_dst_anchor(self):
# 5172
dti = DatetimeIndex([datetime(2012, 11, 4, 23)], tz='US/Eastern')
df = DataFrame([5], index=dti)
assert_frame_equal(df.resample(rule='CD').sum(),
assert_frame_equal(df.resample(rule='D').sum(),
DataFrame([5], index=df.index.normalize()))
df.resample(rule='MS').sum()
assert_frame_equal(
Expand Down Expand Up @@ -1333,14 +1333,14 @@ def test_resample_dst_anchor(self):

df_daily = df['10/26/2013':'10/29/2013']
assert_frame_equal(
df_daily.resample("CD").agg({"a": "min", "b": "max", "c": "count"})
df_daily.resample("D").agg({"a": "min", "b": "max", "c": "count"})
[["a", "b", "c"]],
DataFrame({"a": [1248, 1296, 1346, 1394],
"b": [1295, 1345, 1393, 1441],
"c": [48, 50, 48, 48]},
index=date_range('10/26/2013', '10/29/2013',
freq='CD', tz='Europe/Paris')),
'CD Frequency')
freq='D', tz='Europe/Paris')),
'D Frequency')

def test_downsample_across_dst(self):
# GH 8531
Expand Down
7 changes: 4 additions & 3 deletions pandas/tests/resample/test_period_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -289,10 +289,11 @@ def test_resample_nonexistent_time_bin_edge(self):
index = date_range(start='2017-10-10', end='2017-10-20', freq='1H')
index = index.tz_localize('UTC').tz_convert('America/Sao_Paulo')
df = DataFrame(data=list(range(len(index))), index=index)
result = df.groupby(pd.Grouper(freq='1D'))
result = df.groupby(pd.Grouper(freq='1D')).count()
expected = date_range(start='2017-10-09', end='2017-10-20', freq='D',
tz="America/Sao_Paulo")
tm.assert_index_equal(result.count().index, expected)
tz="America/Sao_Paulo", nonexistent='shift',
closed='left')
tm.assert_index_equal(result.index, expected)

def test_resample_ambiguous_time_bin_edge(self):
# GH 10117
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/series/test_timezones.py
Original file line number Diff line number Diff line change
Expand Up @@ -343,7 +343,7 @@ def test_getitem_pydatetime_tz(self, tzstr):

def test_series_truncate_datetimeindex_tz(self):
# GH 9243
idx = date_range('4/1/2005', '4/30/2005', freq='CD', tz='US/Pacific')
idx = date_range('4/1/2005', '4/30/2005', freq='D', tz='US/Pacific')
s = Series(range(len(idx)), index=idx)
result = s.truncate(datetime(2005, 4, 2), datetime(2005, 4, 4))
expected = Series([1, 2, 3], index=idx[1:4])
Expand Down
Loading