Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF/DEPR: DatetimeIndex constructor #23675

Merged
merged 40 commits into from
Dec 3, 2018
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
f8efbef
start untangling DatetimeIndex constructor; deprecate passing of time…
jbrockmendel Nov 13, 2018
9d20bc9
add GH references
jbrockmendel Nov 13, 2018
aef3f4c
Fix incorrect usage of DatetimeIndex
jbrockmendel Nov 13, 2018
66ae42b
dummy commit to force CI
jbrockmendel Nov 14, 2018
d0e8ee3
more explicit name, test with TimedeltaIndex
jbrockmendel Nov 14, 2018
e1f4e17
remove comment
jbrockmendel Nov 14, 2018
d18e0df
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 14, 2018
a4c8c77
make exception catching less specific
jbrockmendel Nov 14, 2018
5dc5980
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 14, 2018
7e5587e
Merge remote-tracking branch 'upstream/master' into jbrockmendel-pre-…
TomAugspurger Nov 14, 2018
e94e826
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 15, 2018
7464d15
checks in both to_datetime and DatetimeIndex.__new__
jbrockmendel Nov 15, 2018
80b5dbe
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 15, 2018
3c822f1
name and docstring
jbrockmendel Nov 15, 2018
ba7e5e8
isort and flake8 fixup
jbrockmendel Nov 15, 2018
3ba9da7
move freq check earlier
jbrockmendel Nov 15, 2018
49c11e1
Merge branch 'pre-fixes4' of https://github.com/jbrockmendel/pandas i…
jbrockmendel Nov 15, 2018
f1d3fd8
improve exc message
jbrockmendel Nov 16, 2018
d44055e
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 16, 2018
1471a2b
tests for to_datetime and PeriodDtype
jbrockmendel Nov 16, 2018
11b5f6c
use objects_to_datetime64ns in to_datetime
jbrockmendel Nov 16, 2018
9f56d23
isort fixup
jbrockmendel Nov 17, 2018
1c3a5aa
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 18, 2018
be4d472
requested edits, name changes
jbrockmendel Nov 18, 2018
145772d
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 19, 2018
7c99105
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 19, 2018
6b60da2
comments, remove has_format
jbrockmendel Nov 19, 2018
a7038bb
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 20, 2018
14d923b
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 22, 2018
c9dbf24
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 25, 2018
ce9914d
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 26, 2018
b3d5bb7
dummy commit to force CI
jbrockmendel Nov 26, 2018
09c88fc
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 27, 2018
0367d6f
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 27, 2018
7cc8577
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 27, 2018
b3a096b
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 29, 2018
fd5af18
Flesh out comment
jbrockmendel Dec 2, 2018
2cdd215
comment
jbrockmendel Dec 3, 2018
782ca81
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Dec 3, 2018
03d5b35
comment more
jbrockmendel Dec 3, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1029,6 +1029,7 @@ Deprecations
`use_threads` to reflect the changes in pyarrow 0.11.0. (:issue:`23053`)
- :func:`pandas.read_excel` has deprecated accepting ``usecols`` as an integer. Please pass in a list of ints from 0 to ``usecols`` inclusive instead (:issue:`23527`)
- Constructing a :class:`TimedeltaIndex` from data with ``datetime64``-dtyped data is deprecated, will raise ``TypeError`` in a future version (:issue:`23539`)
- Constructing a :class:`DatetimeIndex` from data with ``timedelta64``-dtyped data is deprecated, will raise ``TypeError`` in a future version (:issue:`23675`)

.. _whatsnew_0240.deprecations.datetimelike_int_ops:

Expand Down
4 changes: 2 additions & 2 deletions pandas/_libs/tslibs/conversion.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -858,8 +858,8 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
- bool if True, treat all vals as DST. If False, treat them as non-DST
- 'NaT' will return NaT where there are ambiguous times

nonexistent : str
If arraylike, must have the same length as vals
nonexistent : {None, "NaT", "shift", "raise"}
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
How to handle non-existent times when converting wall times to UTC

.. versionadded:: 0.24.0

Expand Down
154 changes: 153 additions & 1 deletion pandas/core/arrays/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@

from pandas.core.dtypes.common import (
_NS_DTYPE,
is_float_dtype,
is_timedelta64_dtype,
is_period_dtype,
is_extension_type,
is_object_dtype,
is_int64_dtype,
is_datetime64tz_dtype,
Expand Down Expand Up @@ -186,7 +190,7 @@ class DatetimeArrayMixin(dtl.DatetimeLikeArrayMixin):
_freq = None

@classmethod
def _simple_new(cls, values, freq=None, tz=None, **kwargs):
def _simple_new(cls, values, freq=None, tz=None):
"""
we require the we have a dtype compat for the values
if we are passed a non-dtype compat, then coerce using the constructor
Expand Down Expand Up @@ -1408,6 +1412,154 @@ def to_julian_date(self):
DatetimeArrayMixin._add_datetimelike_methods()


# -------------------------------------------------------------------
# Constructor Helpers

def maybe_infer_tz(tz, inferred_tz):
"""
If a timezone is inferred from data, check that it is compatible with
the user-provided timezone, if any.

Parameters
----------
tz : tzinfo or None
inferred_tz : tzinfo or None

Returns
-------
tz : tzinfo or None

Raises
------
TypeError : if both timezones are present but do not match
"""
if tz is None:
tz = inferred_tz
elif inferred_tz is None:
pass
elif not timezones.tz_compare(tz, inferred_tz):
raise TypeError('data is already tz-aware {inferred_tz}, unable to '
'set specified tz: {tz}'
.format(inferred_tz=inferred_tz, tz=tz))
return tz


def dtype_conversions(data, copy, has_format=False):
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
"""
Convert data based on dtype conventions, issuing deprecation warnings
or errors where appropriate.

Parameters
----------
data : np.ndarray or pd.Index
copy : bool
has_format : bool, default False
Indicates if the data will be passed to a parsing function with a
`format` kwarg.

Returns
-------
data : np.ndarray or pd.Index
copy : bool

Raises
------
TypeError : PeriodDType data is passed
"""

if is_float_dtype(data) and not has_format:
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
# Note: we must cast to datetime64[ns] here in order to treat these
# as wall-times instead of UTC timestamps.
data = data.astype(_NS_DTYPE)
copy = False
# TODO: Why do we treat this differently from integer dtypes?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On which difference are you pointing here? The wall time vs utc ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DatetimeIndex(data=int64data, tz=tz) treats the data as UTC times. DatetimeIndex(data=floatdata, tz=tz) treats the data as wall-times. I would have expected that float data would be treated like int64 data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i agree this seems a little odd, these should just be cast to integer i think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback i think all comments except this one have been addressed (a few minutes away from green if all goes well). Changing this would be a breaking change, albeit a small one. Should we change now, deprecate now, or leave for a later PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for this pr

yeah should fix / deprecate whatever is needed later


elif is_timedelta64_dtype(data):
warnings.warn("Passing timedelta64-dtype data to {cls} is "
"deprecated, will raise a TypeError in a future "
"version".format(cls="TimedeltaIndex/Array"),
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
FutureWarning, stacklevel=3)
data = data.view(_NS_DTYPE)
jorisvandenbossche marked this conversation as resolved.
Show resolved Hide resolved

elif is_period_dtype(data):
# Note: without explicitly raising here, PeriondIndex
# test_setops.test_join_does_not_recur fails
raise TypeError("Passing PeriodDtype data to {cls} is invalid. "
"Use `data.to_timestamp()` instead"
.format(cls="TimedeltaIndex/Array"))

elif is_extension_type(data) and not is_datetime64tz_dtype(data):
# Includes categorical
# TODO: We have no tests for these
data = np.array(data, dtype=np.object_)
copy = False

return data, copy


def _objects_to_datetime64ns(data, dayfirst, yearfirst):
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
"""
Convert data to array of timestamps.

Parameters
----------
data : np.ndarray[object]
dayfirst : bool
yearfirst : bool
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved

Returns
-------
result : ndarray
np.int64 dtype if returned values represent UTC timestamps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to verify: getting an you'll get an np.int64-dtype result if and only if inferred_tz is not None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct

np.datetime64[ns] if returned values represent wall times
inferred_tz : tzinfo or None

Raises
------
ValueError : if data cannot be converted to datetimes
"""
errors = "raise"
tz = None
require_iso8601 = False

# if str-dtype, convert
data = np.array(data, copy=False, dtype=np.object_)

try:
result, tz_parsed = tslib.array_to_datetime(
data,
errors=errors,
utc=tz == 'utc',
dayfirst=dayfirst,
yearfirst=yearfirst,
require_iso8601=require_iso8601
)
except ValueError as e:
try:
values, tz = conversion.datetime_to_datetime64(data)
# If tzaware, these values represent unix timestamps, so we
# return them as i8 to distinguish from wall times
return values.view('i8'), tz
except (ValueError, TypeError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this addtiional level of try/except here? wouldn't this just raise anyhow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably to re-raise the original exception if the fallback fails. This is the existing behavior.

I think @mroeschke and I are vaguely planning to revisit this before long and combine datetime_to_datetime64 into array_to_datetime, fixing the many idiosyncracies of these calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe i wasn't clear. I think you can simply remove the try/except and it will work the way it is now. (and same below).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the claim you're making is that it will raise under the same conditions, I agree. If the claim is that it will raise the same exception, I don't. i.e. if the conversion.datetime_to_datetime64 were not inside its own try/except, then we could end up getting a TypeError in cases where we currently get a ValueError

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really? i don't think that is actually possible. the original exception is re-raised here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback did this resolve the miscommunication?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again if datetime_to_datetime64 raises
won’t it just bubble up which is the same thing as a reraise here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a "no" on the resolving miscommunication. Did I at least accurately summarize your suggested change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel is correct here.

side-note, we could clarify all this with python-3 style

except (ValueError, TypeError) as e2:
    raise e2 from e

or six's https://pythonhosted.org/six/#six.raise_from, but it's probably just easier to wait until next month.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a "no" on the resolving miscommunication. Did I at least accurately summarize your suggested change?

not resolved, but I see from @TomAugspurger which what i was getting at. i guess ok for now.

raise e

if tz_parsed is not None:
# We can take a shortcut since the datetime64 numpy array
# is in UTC
# Return i8 values to denote unix timestamps
return result.view('i8'), tz_parsed
elif is_datetime64_dtype(result):
# returning M8[ns] denotes wall-times; since tz is None
# the distinction is a thin one
return result, tz
elif is_object_dtype(result):
# e.g. an Index of datetime objects; raise and let the
# calling function salvage the result if desired
raise ValueError(result)
else: # pragma: no cover
raise TypeError(result)


def _generate_regular_range(cls, start, end, periods, freq):
"""
Generate a range of dates with the spans between dates described by
Expand Down
74 changes: 41 additions & 33 deletions pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,17 @@

from pandas.core.dtypes.common import (
_INT64_DTYPE, _NS_DTYPE, ensure_int64, is_datetime64_dtype,
is_datetime64_ns_dtype, is_datetimetz, is_dtype_equal, is_float,
is_integer, is_integer_dtype, is_list_like, is_period_dtype, is_scalar,
is_string_like, pandas_dtype)
is_datetime64_ns_dtype, is_datetime64tz_dtype, is_dtype_equal, is_float,
is_integer, is_list_like, is_object_dtype, is_period_dtype, is_scalar,
is_string_dtype, is_string_like, pandas_dtype)
import pandas.core.dtypes.concat as _concat
from pandas.core.dtypes.generic import ABCSeries
from pandas.core.dtypes.missing import isna

from pandas.core.arrays import datetimelike as dtl
from pandas.core.arrays.datetimes import (
DatetimeArrayMixin as DatetimeArray, _to_m8)
DatetimeArrayMixin as DatetimeArray, _objects_to_datetime64ns, _to_m8,
dtype_conversions, maybe_infer_tz)
from pandas.core.base import _shared_docs
import pandas.core.common as com
from pandas.core.indexes.base import Index, _index_shared_docs
Expand Down Expand Up @@ -248,50 +249,59 @@ def __new__(cls, data=None,
name = data.name

freq, freq_infer = dtl.maybe_infer_freq(freq)
if freq is None and hasattr(data, "freq"):
# i.e. DatetimeArray/Index
# TODO: Should this be the stronger condition of `freq_infer`?
freq = data.freq
verify_integrity = False

# if dtype has an embedded tz, capture it
tz = dtl.validate_tz_from_dtype(dtype, tz)

if not isinstance(data, (np.ndarray, Index, ABCSeries, DatetimeArray)):
# other iterable of some kind
if not isinstance(data, (list, tuple)):
if not hasattr(data, "dtype"):
# e.g. list, tuple
if np.ndim(data) == 0:
# i.e. generator
data = list(data)
data = np.asarray(data, dtype='O')
data = np.asarray(data)
copy = False
elif isinstance(data, ABCSeries):
data = data._values

# data must be Index or np.ndarray here
if not (is_datetime64_dtype(data) or is_datetimetz(data) or
is_integer_dtype(data) or lib.infer_dtype(data) == 'integer'):
data = tools.to_datetime(data, dayfirst=dayfirst,
yearfirst=yearfirst)

if isinstance(data, DatetimeArray):
if tz is None:
tz = data.tz
elif data.tz is None:
data = data.tz_localize(tz, ambiguous=ambiguous)
else:
# the tz's must match
if not timezones.tz_compare(tz, data.tz):
msg = ('data is already tz-aware {0}, unable to '
'set specified tz: {1}')
raise TypeError(msg.format(data.tz, tz))
# By this point we are assured to have either a numpy array or Index

data, copy = dtype_conversions(data, copy)

if is_object_dtype(data) or is_string_dtype(data):
# TODO: We do not have tests specific to string-dtypes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could just write this as

try:
      data = datas.astype(np.int64)
except:
     ...

might be more clear

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with that is np.array(['20160405']) becomes np.array([20160405]) instead of 2016-04-05.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sure

# also complex or categorical or other extension
copy = False
if lib.infer_dtype(data) == 'integer':
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
data = data.astype(np.int64)
else:
# data comes back here as either i8 to denote UTC timestamps
# or M8[ns] to denote wall times
data, inferred_tz = _objects_to_datetime64ns(
data, dayfirst=dayfirst, yearfirst=yearfirst)
tz = maybe_infer_tz(tz, inferred_tz)

if is_datetime64tz_dtype(data):
tz = maybe_infer_tz(tz, data.tz)
subarr = data._data

if freq is None:
freq = data.freq
verify_integrity = False
elif issubclass(data.dtype.type, np.datetime64):
elif is_datetime64_dtype(data):
# DatetimeIndex or ndarray[datetime64]
data = getattr(data, "_data", data)
if data.dtype != _NS_DTYPE:
data = conversion.ensure_datetime64ns(data)

if tz is not None:
# Convert tz-naive to UTC
tz = timezones.maybe_get_tz(tz)
data = conversion.tz_localize_to_utc(data.view('i8'), tz,
ambiguous=ambiguous)
subarr = data.view(_NS_DTYPE)

else:
# must be integer dtype otherwise
# assume this data are epoch timestamps
Expand Down Expand Up @@ -319,17 +329,15 @@ def __new__(cls, data=None,
return subarr._deepcopy_if_needed(ref_to_data, copy)

@classmethod
def _simple_new(cls, values, name=None, freq=None, tz=None,
dtype=None, **kwargs):
def _simple_new(cls, values, name=None, freq=None, tz=None, dtype=None):
"""
we require the we have a dtype compat for the values
if we are passed a non-dtype compat, then coerce using the constructor
"""
# DatetimeArray._simple_new will accept either i8 or M8[ns] dtypes
assert isinstance(values, np.ndarray), type(values)

result = super(DatetimeIndex, cls)._simple_new(values, freq, tz,
**kwargs)
result = super(DatetimeIndex, cls)._simple_new(values, freq, tz)
result.name = name
result._reset_identity()
return result
Expand Down
6 changes: 6 additions & 0 deletions pandas/core/tools/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,8 @@ def _convert_listlike_datetimes(arg, box, format, name=None, tz=None,
- ndarray of Timestamps if box=False
"""
from pandas import DatetimeIndex
from pandas.core.arrays.datetimes import dtype_conversions

if isinstance(arg, (list, tuple)):
arg = np.array(arg, dtype='O')

Expand Down Expand Up @@ -220,6 +222,10 @@ def _convert_listlike_datetimes(arg, box, format, name=None, tz=None,
raise TypeError('arg must be a string, datetime, list, tuple, '
'1-d array, or Series')

# warn if passing timedelta64, raise for PeriodDtype
# NB: this must come after unit transformation
arg = dtype_conversions(arg, False, has_format=format is not None)[0]
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved

arg = ensure_object(arg)
require_iso8601 = False

Expand Down
13 changes: 13 additions & 0 deletions pandas/tests/indexes/datetimes/test_construction.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,19 @@

class TestDatetimeIndex(object):

def test_dti_with_timedelta64_data_deprecation(self):
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
# GH#23675
data = np.array([0], dtype='m8[ns]')
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
with tm.assert_produces_warning(FutureWarning):
result = DatetimeIndex(data)

assert result[0] == Timestamp('1970-01-01')

with tm.assert_produces_warning(FutureWarning):
result = DatetimeIndex(pd.TimedeltaIndex(data))

assert result[0] == Timestamp('1970-01-01')

def test_construction_caching(self):

df = pd.DataFrame({'dt': pd.date_range('20130101', periods=3),
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/scalar/timedelta/test_timedelta.py
Original file line number Diff line number Diff line change
Expand Up @@ -548,7 +548,7 @@ def test_overflow(self):

# mean
result = (s - s.min()).mean()
expected = pd.Timedelta((pd.DatetimeIndex((s - s.min())).asi8 / len(s)
expected = pd.Timedelta((pd.TimedeltaIndex((s - s.min())).asi8 / len(s)
).sum())

# the computation is converted to float so
Expand Down