Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert handling of i8values to DatetimeIndex #24708

Merged
merged 19 commits into from
Jan 11, 2019

Conversation

TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Jan 10, 2019

TODO:

  • decide if we're deprecating anything.
  • update the message (and update filterwarnings)
  • whatsnew

Questions

  1. What's the expected behavior of pd.Index(i8).dtype(DatetimeTZDtype)?
  2. Do we prefer this int_as_wall_time parameter, or @jbrockmendel's _from_sequence_dti from jbrockmendel@635b267

cc @jbrockmendel @jorisvandenbossche @jreback @mroeschke

xref #24559

@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Jan 10, 2019
@TomAugspurger TomAugspurger added Datetime Datetime data dtype Timezones Timezone data dtype labels Jan 10, 2019
@jbrockmendel
Copy link
Member

I'll take a look at Tom's edits in a little bit. #24686 should be good to go and handles the only part of DTA.__init__ that I consider mandatory

@jbrockmendel
Copy link
Member

Travis looks like just a complaint about stacklevel

@jbrockmendel
Copy link
Member

Does this correspond to one of the options in the table in #24559?

@TomAugspurger
Copy link
Contributor Author

The options in those tables represent the desired future behavior.

This is changing the current behavior of DTI to match 0.23.4's behavior (i8 values are treated as wall times). We need to figure out

  1. If we want to deprecate the current behavior in DTI, and what the future desired behavior will be
  2. Update all of DatetimeIndex, DatetimeArray, and Timestamp to implement our desired behavior.

For the RC I only care about

  1. Reverting the handing of i8 back to 0.23.4 behavior.
  2. Implementing the deprecation warning for the future behavior in DTI only (if any change from 0.23.4).

@jorisvandenbossche
Copy link
Member

Does this correspond to one of the options in the table in #24559?

Not directly, as this is mainly about reverting the behaviour to 0.23.4.
But, whether to add a deprecation warning or not, and what to do for astype, does actually depend on the outcome of the discussion in #24559 though.

So if we add a deprecation warning, that means that we choose one of the options where DatetimeIndex(int) is unix-epoch (that are still multiple options though).

(I think that is one of the aspects we are most sure about / most agreement, but if needed, we can also only here do the revert, and defer adding a deprecation warning until we made a final discussion on which option to choose)

@jorisvandenbossche
Copy link
Member

Double post :-)

Yes, agreed on the priorities for the RC. But so the deprecation warning still depends somewhat on which desired future behaviour we want.

@TomAugspurger
Copy link
Contributor Author

Yes, agreed on the priorities for the RC. But so the deprecation warning still depends somewhat on which desired future behaviour we want.

Agreed. We'll need to sort that out. I'm off for ~5 hours. Hopefully you all figure it out in the meantime :)

@TomAugspurger
Copy link
Contributor Author

Actually, I think the relevant decisions have been made for this PR, right? Does everyone agree with the following axiom?

  • cls(i8, tz=tz) <=> cls(i8).tz_loacalize("UTC").tz_convert(tz)

In words, passing integer values and a timezone is equivalent to passing interger values, localizing to UTC, and converting to the desired tz? In other words, integer values are treated like UTC (unix epochs). All the options in
#24559 (comment), aside from Option 1 (0.23.4 IIRC), agree that ints should be unix epoch.

@TomAugspurger
Copy link
Contributor Author

On Index.astype, in 0.23.4 that errored.

In [2]: pd.Index([946684800000000000]).astype('datetime64[ns, US/Central]')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/Envs/dask-dev/lib/python3.7/site-packages/pandas/core/indexes/base.py in astype(self, dtype, copy)
   1291         try:
-> 1292             return Index(self.values.astype(dtype, copy=copy), name=self.name,
   1293                          dtype=dtype)

TypeError: Invalid datetime unit in metadata string "[ns, US/Central]"

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-2-e1a69a3f7ce5> in <module>
----> 1 pd.Index([946684800000000000]).astype('datetime64[ns, US/Central]')

~/Envs/dask-dev/lib/python3.7/site-packages/pandas/core/indexes/base.py in astype(self, dtype, copy)
   1294         except (TypeError, ValueError):
   1295             msg = 'Cannot cast {name} to dtype {dtype}'
-> 1296             raise TypeError(msg.format(name=type(self).__name__, dtype=dtype))
   1297
   1298     def _to_safe_for_reshape(self):

TypeError: Cannot cast Int64Index to dtype datetime64[ns, US/Central]

Does anyone object to doing essentially

In [8]: pd.Index([946684800000000000]).astype('datetime64[ns]').tz_localize("UTC").tz_convert("US/Central")
Out[8]: DatetimeIndex(['1999-12-31 18:00:00-06:00'], dtype='datetime64[ns, US/Central]', freq=None)

which is equivalent to the future behavior?

@mroeschke
Copy link
Member

No objections on my end.

So will the Index.astype behavior be fixed/introduced in this PR or after the warning is introduced?

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Jan 10, 2019

So will the Index.astype behavior be fixed/introduced in this PR or after the warning is introduced?

Index.astype(DatetimeTZDtype) works on master. And AFAICT, does the right thing.
master, before this PR:

In [2]: pd.Index([946684800000000000], dtype='datetime64[ns, US/Central]')
Out[2]: DatetimeIndex(['1999-12-31 18:00:00-06:00'], dtype='datetime64[ns, US/Central]', freq=None)

In [3]: pd.Index([946684800000000000]).astype('datetime64[ns]').tz_localize("UTC").tz_convert("US/Central")
Out[3]: DatetimeIndex(['1999-12-31 18:00:00-06:00'], dtype='datetime64[ns, US/Central]', freq=None)

I just need to change the implementation to avoid the warning.

This will introduce a temporary discrepancy between Index(i8).astype(DatetimeTZDtype) and Index(i8, dtype=DatetimeDTZDtype), as the .astype version will already be doing the future behavior. I think that's preferred.

@codecov
Copy link

codecov bot commented Jan 10, 2019

Codecov Report

Merging #24708 into master will decrease coverage by 49.31%.
The diff coverage is 66.66%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #24708       +/-   ##
===========================================
- Coverage   92.38%   43.06%   -49.32%     
===========================================
  Files         166      166               
  Lines       52310    52318        +8     
===========================================
- Hits        48327    22532    -25795     
- Misses       3983    29786    +25803
Flag Coverage Δ
#multiple ?
#single 43.06% <66.66%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/datetimes.py 48.46% <ø> (-47.82%) ⬇️
pandas/core/reshape/tile.py 11.23% <0%> (-83.66%) ⬇️
pandas/core/arrays/datetimes.py 66.07% <100%> (-31.65%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/core/categorical.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
pandas/io/formats/html.py 0% <0%> (-99.35%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-95.46%) ⬇️
... and 126 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 021cbae...1456699. Read the comment docs.

@codecov
Copy link

codecov bot commented Jan 10, 2019

Codecov Report

Merging #24708 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24708      +/-   ##
==========================================
+ Coverage   92.38%   92.38%   +<.01%     
==========================================
  Files         166      166              
  Lines       52321    52335      +14     
==========================================
+ Hits        48338    48352      +14     
  Misses       3983     3983
Flag Coverage Δ
#multiple 90.81% <100%> (ø) ⬆️
#single 43.06% <40%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/datetimes.py 96.27% <ø> (ø) ⬆️
pandas/core/reshape/tile.py 94.94% <100%> (+0.05%) ⬆️
pandas/core/arrays/datetimes.py 97.78% <100%> (+0.02%) ⬆️
pandas/core/indexes/base.py 96.3% <100%> (ø) ⬆️
pandas/io/packers.py 88.21% <0%> (ø) ⬆️
pandas/io/sas/sas7bdat.py 91.16% <0%> (ø) ⬆️
pandas/io/pytables.py 92.31% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e2b2068...8332c40. Read the comment docs.

@TomAugspurger
Copy link
Contributor Author

See 1456699 for the astype changes.

I'll push a whatsnew later tonight, unless anyone beats me to it.

We have an outstanding TODO: Should DatetimeIndex(i8, tz="UTC") warn?

@TomAugspurger
Copy link
Contributor Author

Also need the filterwarnings to match our message, so we aren't accidentally silencing errors.

@mroeschke
Copy link
Member

Should DatetimeIndex(i8, tz="UTC") warn?

I think there's a slight preference not to warn in this case.

pandas/core/arrays/datetimes.py Outdated Show resolved Hide resolved
@TomAugspurger
Copy link
Contributor Author

Updating UTC handling and writing a release note now.

* message
* utc
@TomAugspurger
Copy link
Contributor Author

I don't think 90afdee will do this trick, but trying anyway.

@jbrockmendel
Copy link
Member

Just cloned, will take a look

@TomAugspurger
Copy link
Contributor Author

I'm guessing it's just strangeness with python 2's warning handling. The tests are fine on their own AFAICT.

@TomAugspurger
Copy link
Contributor Author

Got it locally. Just narrowing down now. pytest pandas/tests/indexes pandas/tests/indexes/test_base.py::TestIndex::test_constructor_dtypes_datetime -x --pdb

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Jan 11, 2019

pytest -x --pdb -v 'pandas/tests/indexes/datetimes/test_construction.py::TestDatetimeIndex::test_constructor_with_int_tz[US/Pacific-datetime64[ns, US/Pacific]-box1-Index]' pandas/tests/indexes/test_base.py::TestIndex::test_constructor_dtypes_datetime

@TomAugspurger
Copy link
Contributor Author

6633344 may do it, not sure why we would need to skip that test on py2...

@TomAugspurger
Copy link
Contributor Author

Not fixed yet :/

@TomAugspurger
Copy link
Contributor Author

pytest -x --pdb pandas/tests/test_base.py::TestToIterable::test_iterable pandas/tests/indexes/test_base.py::TestIndex::test_constructor_dtypes_datetime[US/Eastern-Index-asi8-True] also fails

@jbrockmendel
Copy link
Member

I can reproduce all 7 failures on Linux with python -m pytest pandas/tests/test_[abcde]*.py pandas/tests/indexes/test_base.py --skip-slow. Currently trying to narrow down the culprits...

@jbrockmendel
Copy link
Member

Narrowed it down to just python -m pytest pandas/tests/test_base.py pandas/tests/indexes/test_base.py --skip-slow

@TomAugspurger
Copy link
Contributor Author

That seemed to do it. Removing the hopefully unnecessary skip.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the docstring for the new param

@TomAugspurger
Copy link
Contributor Author

Thanks. Merging in 1 hour if no objections.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jan 11, 2019

Should we maybe add some specific TODO statement in those places (eg in some astype related code) where you now did some extra if/else that can be cleaned up once deprecation is removed?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor comments

pandas/core/arrays/datetimes.py Show resolved Hide resolved
def test_constructor_with_int_tz(self, klass, box, tz, dtype):
# GH 20997, 20964
ts = Timestamp('2018-01-01', tz=tz)
result = klass(box([ts.value]), dtype=dtype)
expected = klass([ts])
assert result == expected

# This is the desired future behavior
@pytest.mark.xfail(reason="TBD", strict=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBD?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be determined, thought I suppose we've determined that the test is the desired future behavior.

@TomAugspurger
Copy link
Contributor Author

Will fix those up and add some TODOs for when the deprecation is enforced.

@TomAugspurger
Copy link
Contributor Author

All green. Merging.

@TomAugspurger TomAugspurger merged commit 7c4efe0 into pandas-dev:master Jan 11, 2019
@TomAugspurger TomAugspurger deleted the revert-i8 branch January 11, 2019 14:05
@jorisvandenbossche
Copy link
Member

@TomAugspurger Thanks!

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants