Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API/BUG: DatetimeIndex correctly localizes integer data #21216

Merged
merged 21 commits into from
Jun 14, 2018

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke commented May 26, 2018

closes #20997
closes #20964

  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

I did a little refactoring of Datetime.__new__ that converts passed data but made it easier to fix this issue. I had to modify some methods (join, normalize, delete, insert) and tests that weren't assuming integer data weren't epoch timestamps.

@mroeschke mroeschke added API Design Timezones Timezone data dtype labels May 26, 2018
@codecov
Copy link

codecov bot commented May 26, 2018

Codecov Report

Merging #21216 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21216      +/-   ##
==========================================
- Coverage    91.9%   91.89%   -0.01%     
==========================================
  Files         153      153              
  Lines       49606    49597       -9     
==========================================
- Hits        45589    45579      -10     
- Misses       4017     4018       +1
Flag Coverage Δ
#multiple 90.29% <100%> (-0.01%) ⬇️
#single 41.88% <76%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/base.py 96.62% <100%> (ø) ⬆️
pandas/core/indexes/datetimes.py 95.66% <100%> (-0.14%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b5fc769...dc7e5c0. Read the comment docs.

@jorisvandenbossche jorisvandenbossche added this to the 0.24.0 milestone May 26, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok looks pretty good.

  • can you add a whatsnew note (0.24.0)
  • can you test with .astype as well
  • can you search thru some issue for timezone / DTI. IIRC I think this might solve some (just a guess)

tm.assert_index_equal(i, i2)
assert i.tz.zone == 'US/Eastern'

i2 = DatetimeIndex(i.tz_localize(None).asi8, dtype=i.dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i actually like this construction, can you add a test that does both? (e.g. both uses and does not localize), maybe split it out and parameterize

@pytest.mark.parametrize('klass', [Index, DatetimeIndex])
@pytest.mark.parametrize('box', [np.array, list])
def test_constructor_with_int_tz(self, klass, box):
ts = Timestamp('2018-01-01', tz='US/Pacific')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue number

@pytest.mark.parametrize("klass", [pd.Index, pd.DatetimeIndex])
def test_constructor_dtypes_datetime(self, tz_naive_fixture, values,
def test_constructor_dtypes_datetime(self, tz_naive_fixture, attr, utc,
klass):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a bit of commentary on what is tested here

@mroeschke
Copy link
Member Author

Thanks @jreback. Moved the whatsnew to v0.24, added some additional logic to get astype working. It doesn't appear to fix any of the other reported timezone issues.

from pandas.io.formats.format import _is_dates_only
return _is_dates_only(self.values)
return _is_dates_only(self.values) and self.tz is None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did this need changing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this refactor, I was getting errors where the repr didn't look correct but the i8 data was correct in the case when a tz was passed in the DatetimeIndex constructor.

_is_dates_only tries to:

return a boolean if we are only dates (and don't have a timezone)

but checks for timezones by re-passing self.values into DatetimeIndex. I am not sure the historical reason why this was the case, but checking self.tz is None fixed the repr issue (and seems to achieve what that function was trying to do in the first place)

else:
# must be integer dtype otherwise
if isinstance(data, Int64Index):
raise TypeError('cannot convert Int64Index->DatetimeIndex')
# assume this data are epoch timestamps
if data.dtype != _INT64_DTYPE:
data = data.astype(np.int64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could
data = data.astype(np.int64, copy=False)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have any tests with a non-int64 dtype, can you add something

@@ -225,6 +225,13 @@ def _check_rng(rng):
_check_rng(rng_eastern)
_check_rng(rng_utc)

def test_integer_index_astype_datetimetz_dtype(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use tz_naive_fixture here

@@ -26,25 +27,28 @@ def test_construction_caching(self):
freq='ns')})
assert df.dttz.dtype.tz.zone == 'US/Eastern'

def test_construction_with_alt(self):

@pytest.mark.parametrize('kwargs', [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parameterize on tz_aware_fixture (doesn't include None), then parameterize as you are doing

tm.assert_index_equal(i, result)
assert result.tz.zone == 'US/Eastern'

@pytest.mark.parametrize('kwargs', [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@pytest.mark.parametrize('kwargs', [
{'tz': 'dtype.tz'},
{'dtype': 'dtype'},
{'dtype': 'dtype', 'tz': 'dtype.tz'}])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain how this test is different than the one above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am preserving the construction of tz_localize(None) that was here before but split it out in its own test as discussed here #21216 (comment)

@@ -469,6 +473,15 @@ def test_constructor_with_non_normalized_pytz(self, tz):
result = DatetimeIndex(['2010'], tz=non_norm_tz)
assert pytz.timezone(tz) is result.tz

@pytest.mark.parametrize('klass', [Index, DatetimeIndex])
@pytest.mark.parametrize('box', [np.array, list])
def test_constructor_with_int_tz(self, klass, box):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameterize on tz_naive_fixture

@jreback
Copy link
Contributor

jreback commented Jun 7, 2018

looks good. @jschendel @WillAyd if you can have a look if any comments.

@jreback jreback merged commit 40d5982 into pandas-dev:master Jun 14, 2018
@jreback
Copy link
Contributor

jreback commented Jun 14, 2018

thanks @mroeschke

@@ -1175,6 +1175,10 @@ def astype(self, dtype, copy=True):
return CategoricalIndex(self.values, name=self.name, dtype=dtype,
copy=copy)
try:
if is_datetime64tz_dtype(dtype):
from pandas.core.indexes.datetimes import DatetimeIndex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side issue (another PR). The imports inside functions should be as simple as possible, e.g.
from pandas import DatetimeIndex (several occurrences of this)

@mroeschke mroeschke deleted the index_datetime_int_redux branch June 14, 2018 17:28
david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this pull request Jun 18, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Timezones Timezone data dtype
Projects
None yet
3 participants