Implement _most_ of the EA interface for DTA/TDA #23643

jbrockmendel · 2018-11-12T18:22:41Z

@TomAugspurger as promised, this implements a handful of tests, and a few more can be ported from #23415, but the EA-specific tests haven't been started.

The big missing piece is DatetimeArray._from_sequence, which I'm getting started on now in a new branch.

Closes #23586

…ike_ea4

pep8speaks · 2018-11-12T18:22:47Z

Hello @jbrockmendel! Thanks for updating the PR.

There are no PEP8 issues in the file pandas/core/arrays/datetimelike.py !
There are no PEP8 issues in the file pandas/core/arrays/datetimes.py !
There are no PEP8 issues in the file pandas/core/arrays/period.py !
There are no PEP8 issues in the file pandas/core/arrays/timedeltas.py !
There are no PEP8 issues in the file pandas/core/indexes/datetimelike.py !
There are no PEP8 issues in the file pandas/core/indexes/datetimes.py !
There are no PEP8 issues in the file pandas/tests/arrays/test_datetimelike.py !

Comment last updated on November 12, 2018 at 20:25 Hours UTC

pandas/core/indexes/datetimes.py

pandas/core/arrays/datetimelike.py

TomAugspurger · 2018-11-12T19:39:04Z

pandas/core/arrays/datetimelike.py

+
+    @classmethod
+    def _concat_same_type(cls, to_concat):
+        # for TimedeltaArray and PeriodArray; DatetimeArray overrides


The problem being that DatetimeArray needs to pass through tz info?

For PeriodArray at least (haven't checked TimedeltaArray) you should able to implement _concat_same_type just in terms of .dtype. It's hashable and can be passed to PeriodArray.__init__.

As opposed to PeriodArray, freq is not part of the dtype for DatetimeArray/TimedeltaArray, so I am not sure this check for freq should be done here.

IIRC, the PeriodArray constructor allows duplicate freq and dtype, as long as the match, so

PeriodArray(data, freq='H', dtype=PeriodDtype("H")

should be fine.

Yah, we added dtype to the PeriodArray constructor specifically so that type(self)(values, freq=self.freq, dtype=self.dtype) would be valid for all three TDA/DTA/PA classes

Sorry, that's not what I was meaning. The difference is the meaning of freq for PeriodArray vs DatetimeArray. For PeriodArray it is part of the dtype and it is defining how the stored integers are interpreted (and thus need to match to just concatenate those integers), but for DatetimeArray it is simply an informative attribute telling you about the regularity of the Array, but not essential to describe it. So I would assume that _concat_same_type needs to handle different arrays with different freqs for DatetimeArray.

pandas/core/arrays/timedeltas.py

jbrockmendel · 2018-11-12T20:25:13Z

Ported the last of the relevant tests from #23415

jorisvandenbossche · 2018-11-12T20:37:52Z

pandas/core/arrays/datetimelike.py

+
+    @classmethod
+    def _concat_same_type(cls, to_concat):
+        # for TimedeltaArray and PeriodArray; DatetimeArray overrides


As opposed to PeriodArray, freq is not part of the dtype for DatetimeArray/TimedeltaArray, so I am not sure this check for freq should be done here.

pandas/core/arrays/datetimelike.py

jorisvandenbossche · 2018-11-12T20:41:45Z

pandas/tests/arrays/test_datetimelike.py

+class SharedTests(object):
+    index_cls = None
+
+    def test_take(self):


We should not need to add such basic tests I think, as those are covered by the base Extension tests (we should of course test datetime specific aspects).

Is there anything in this test not tested by the base tests?

The class-specific tests have tests for invalid fill_values

jorisvandenbossche · 2018-11-12T20:42:46Z

pandas/tests/arrays/test_datetimelike.py

+            arr.take([0, 1], allow_fill=True,
+                     fill_value=pd.Timestamp.now().time)
+
+    def test_concat_same_type(self):


As I think also commented on one of the previous PRs that started doing this, I don't think we should test _concat_same_type here directly. It is already tested by the base extension tests and by all the tests that actually use it under the hood.

Yah, most of these tests were salvaged from one of those older PRs. I don't see much downside to having the tests, but am pretty happy to pawn this decision/PR off on Tom

We still want the fail cases (different dtypes) here.

At this point, we should be able to simplify core/dtypes/concat.py::concat_datetimetz and DatetimeIndex._concat_same_dtype right? I'll take a look.

jorisvandenbossche · 2018-11-12T20:47:18Z

pandas/core/arrays/datetimelike.py

+
+    @classmethod
+    def _from_factorized(cls, values, original):
+        return cls(values, dtype=original.dtype, freq=original.freq)


Does it make sense to pass original.freq here?
It seems a bit strange as this is creating a new array which does not necessarily have the same order as the original one.

Although in practice, if you have a freq, that means you have a regular and unique array to start with, so the factorization is kind of a no-op and the result will still have the same freq? (but might be missing corner cases)

Thinking of a possible corner case, which is currently actually broken: a sorted factorize of a DatetimeIndex with a negative freq:

In [57]: idx = pd.date_range("2012-01-01", periods=3) In [58]: idx Out[58]: DatetimeIndex(['2012-01-01', '2012-01-02', '2012-01-03'], dtype='datetime64[ns]', freq='D') In [59]: pd.factorize(idx) Out[59]: (array([0, 1, 2]), DatetimeIndex(['2012-01-01', '2012-01-02', '2012-01-03'], dtype='datetime64[ns]', freq='D')) In [60]: pd.factorize(idx[::-1]) Out[60]: (array([0, 1, 2]), DatetimeIndex(['2012-01-03', '2012-01-02', '2012-01-01'], dtype='datetime64[ns]', freq='-1D')) In [61]: pd.factorize(idx[::-1], sort=True) Out[61]: (array([2, 1, 0]), DatetimeIndex(['2012-01-01', '2012-01-02', '2012-01-03'], dtype='datetime64[ns]', freq='-1D'))

jbrockmendel · 2018-11-13T18:05:42Z

@TomAugspurger are you still good taking this PR over? (BTW Travis failures look like unrelated timeouts)

TomAugspurger · 2018-11-13T19:20:36Z

Sure will push changes here.

…e_ea4

TomAugspurger · 2018-11-13T20:41:00Z

Ah, I see what you mean now. Yes you're right. And I suppose we can't even assume that concatenating multiple DatetimeArrays with the same freq will end up with a DatetimeIndex with the same freq.

…

On Tue, Nov 13, 2018 at 2:34 PM Joris Van den Bossche < ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pandas/core/arrays/datetimelike.py <#23643 (comment)>: > + raise AbstractMethodError(self) + + def take(self, indices, allow_fill=False, fill_value=None): + if allow_fill: + fill_value = self._validate_fill_value(fill_value) + + new_values = take(self.asi8, + indices, + allow_fill=allow_fill, + fill_value=fill_value) + + return type(self)(new_values, dtype=self.dtype) + + @classmethod + def _concat_same_type(cls, to_concat): + # for TimedeltaArray and PeriodArray; DatetimeArray overrides Sorry, that's not what I was meaning. The difference is the meaning of freq for PeriodArray vs DatetimeArray. For PeriodArray it is part of the dtype and it is defining how the stored integers are interpreted (and thus need to match to just concatenate those integers), but for DatetimeArray it is simply an informative attribute telling you about the regularity of the Array, but not essential to describe it. So I would assume that _concat_same_type needs to handle different arrays with different freqs for DatetimeArray. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#23643 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIphnpyu5IlWltCDwJywVRHis1KMFks5uuyzTgaJpZM4YaMQT> .

jorisvandenbossche · 2018-11-13T20:43:10Z

And I suppose we can't even assume that concatenating multiple DatetimeArrays with the same freq will end up with a DatetimeIndex with the same freq.

Indeed, as that would only be the case if they are nicely consecutive ranges.

_concat._concat_datetimetz -> DatetimeIndex._concat_same_dtype -> DatetimeArray._concat_same_type

TomAugspurger · 2018-11-14T12:56:24Z

Pushed some changes.

DatetimelikeArray.copy now copies the underlying data
changed the concatenation of DatetimeIndex, hopefully for the better.

codecov · 2018-11-14T13:26:16Z

Codecov Report

Merging #23643 into master will increase coverage by <.01%.
The diff coverage is 98.48%.

@@            Coverage Diff             @@
##           master   #23643      +/-   ##
==========================================
+ Coverage   92.24%   92.24%   +<.01%     
==========================================
  Files         161      161              
  Lines       51318    51340      +22     
==========================================
+ Hits        47339    47361      +22     
  Misses       3979     3979

Flag	Coverage Δ
#multiple	`90.64% <98.48%> (ø)`	⬆️
#single	`42.34% <45.45%> (+0.03%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/arrays/datetimelike.py	`96.14% <100%> (+0.21%)`	⬆️
pandas/core/arrays/period.py	`98.44% <100%> (-0.05%)`	⬇️
pandas/core/arrays/timedeltas.py	`95.63% <100%> (+0.55%)`	⬆️
pandas/core/indexes/datetimes.py	`96.11% <100%> (-0.01%)`	⬇️
pandas/core/dtypes/concat.py	`96.66% <100%> (+0.4%)`	⬆️
pandas/core/arrays/datetimes.py	`98.47% <100%> (+0.03%)`	⬆️
pandas/core/indexes/datetimelike.py	`97.74% <85.71%> (-0.28%)`	⬇️
pandas/tseries/offsets.py	`96.98% <0%> (-0.09%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a197837...3cb072e. Read the comment docs.

pandas/core/arrays/timedeltas.py

jreback · 2018-11-14T13:55:04Z

pandas/core/indexes/datetimelike.py

-            return _concat._concat_datetimetz(to_concat, name)
+            # TODO(DatetimeArray)
+            # - remove the .asi8 here
+            # - remove the _maybe_box_as_values


looks like you can remove part of the comment

jreback · 2018-11-14T13:57:22Z

just a really minor comment. ok to merge.

TomAugspurger · 2018-11-14T15:18:31Z

Going to leave the stale comment for now if that's OK. That whole if / else should be going away soon.

jreback · 2018-11-14T15:23:09Z

ok! thanks @jbrockmendel and @TomAugspurger

* upstream/master: (25 commits) DOC: Delete trailing blank lines in docstrings. (pandas-dev#23651) DOC: Change release and whatsnew (pandas-dev#21599) DOC: Fix format of the See Also descriptions (pandas-dev#23654) DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. (pandas-dev#20374) ENH: Allow export of mixed columns to Stata strl (pandas-dev#23692) CLN: Remove unnecessary code (pandas-dev#23696) Pin flake8-rst version (pandas-dev#23699) Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643) CI: raise clone depth limit on CI BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows (pandas-dev#23688) REF: Move Excel names parameter handling to CSV (pandas-dev#23690) DOC: Accessing files from a S3 bucket. (pandas-dev#23639) Fix errorbar visualization (pandas-dev#23674) DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678) DOC: Update is_sparse docstring (pandas-dev#19983) BUG: Fix read_excel w/parse_cols & empty dataset (pandas-dev#23661) Add to_flat_index method to MultiIndex (pandas-dev#22866) CLN: Move to_excel to generic.py (pandas-dev#23656) TST: IntervalTree.get_loc_interval should return platform int (pandas-dev#23660) CI: Allow to compile docs with ipython 7.11 pandas-dev#22990 (pandas-dev#23655) ...

…fixed * upstream/master: DOC: Delete trailing blank lines in docstrings. (pandas-dev#23651) DOC: Change release and whatsnew (pandas-dev#21599) DOC: Fix format of the See Also descriptions (pandas-dev#23654) DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. (pandas-dev#20374) ENH: Allow export of mixed columns to Stata strl (pandas-dev#23692) CLN: Remove unnecessary code (pandas-dev#23696) Pin flake8-rst version (pandas-dev#23699) Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643) CI: raise clone depth limit on CI BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows (pandas-dev#23688) REF: Move Excel names parameter handling to CSV (pandas-dev#23690) DOC: Accessing files from a S3 bucket. (pandas-dev#23639) Fix errorbar visualization (pandas-dev#23674) DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678) DOC: Update is_sparse docstring (pandas-dev#19983) BUG: Fix read_excel w/parse_cols & empty dataset (pandas-dev#23661) Add to_flat_index method to MultiIndex (pandas-dev#22866) CLN: Move to_excel to generic.py (pandas-dev#23656) TST: IntervalTree.get_loc_interval should return platform int (pandas-dev#23660)

jbrockmendel added 4 commits November 11, 2018 20:19

implement most of the rest of EA interface

52e4d6b

Merge branch 'master' of https://github.com/pandas-dev/pandas into dl…

c2bcd80

…ike_ea4

Merge branch 'master' of https://github.com/pandas-dev/pandas into dl…

0ce0a7c

…ike_ea4

implement some tests for take, concat_same_type

45c8161

TomAugspurger reviewed Nov 12, 2018

View reviewed changes

TomAugspurger added this to the 0.24.0 milestone Nov 12, 2018

TomAugspurger added Datetime Datetime data dtype ExtensionArray Extending pandas with custom dtypes or arrays. labels Nov 12, 2018

Fixup typo, implement more tests

75f6944

jbrockmendel mentioned this pull request Nov 12, 2018

WIP: DatetimeArray+TimedeltaArray #23415

Closed

jbrockmendel added 2 commits November 12, 2018 12:25

whitespace fixup

0fb5029

suggested edits

1a781ab

jorisvandenbossche reviewed Nov 12, 2018

View reviewed changes

Merge remote-tracking branch 'upstream/master' into jbrockmendel-dlik…

ccbffe4

…e_ea4

TomAugspurger added 3 commits November 14, 2018 06:39

REF: Simplify concat

eceebc7

_concat._concat_datetimetz -> DatetimeIndex._concat_same_dtype -> DatetimeArray._concat_same_type

copy

a6065cc

deduplicate copy

3cb072e

jreback requested changes Nov 14, 2018

View reviewed changes

jreback approved these changes Nov 14, 2018

View reviewed changes

jreback merged commit fb4405d into pandas-dev:master Nov 14, 2018

jbrockmendel deleted the dlike_ea4 branch November 14, 2018 16:25

JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018

Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643)

f452c40

jbrockmendel mentioned this pull request Nov 14, 2018

Datetimelike Array Refactor #23185

Closed

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018

Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643)

af7a6c5

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643)

a22e45b

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643)

fef01e1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement _most_ of the EA interface for DTA/TDA #23643

Implement _most_ of the EA interface for DTA/TDA #23643

jbrockmendel commented Nov 12, 2018 •

edited

Loading

pep8speaks commented Nov 12, 2018 •

edited

Loading

TomAugspurger Nov 12, 2018

jbrockmendel Nov 12, 2018

jorisvandenbossche Nov 12, 2018

TomAugspurger Nov 13, 2018

jbrockmendel Nov 13, 2018

jorisvandenbossche Nov 13, 2018

jbrockmendel commented Nov 12, 2018

jorisvandenbossche Nov 12, 2018

jorisvandenbossche Nov 12, 2018

jbrockmendel Nov 12, 2018

jorisvandenbossche Nov 12, 2018

jbrockmendel Nov 12, 2018

TomAugspurger Nov 13, 2018 •

edited

Loading

jorisvandenbossche Nov 12, 2018

jorisvandenbossche Nov 12, 2018

jbrockmendel commented Nov 13, 2018

TomAugspurger commented Nov 13, 2018

TomAugspurger commented Nov 13, 2018 via email

jorisvandenbossche commented Nov 13, 2018

TomAugspurger commented Nov 14, 2018

codecov bot commented Nov 14, 2018 •

edited

Loading

jreback Nov 14, 2018

jreback commented Nov 14, 2018

TomAugspurger commented Nov 14, 2018

jreback commented Nov 14, 2018

Implement _most_ of the EA interface for DTA/TDA #23643

Implement _most_ of the EA interface for DTA/TDA #23643

Conversation

jbrockmendel commented Nov 12, 2018 • edited Loading

pep8speaks commented Nov 12, 2018 • edited Loading

Comment last updated on November 12, 2018 at 20:25 Hours UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Nov 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger Nov 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Nov 13, 2018

TomAugspurger commented Nov 13, 2018

TomAugspurger commented Nov 13, 2018 via email

jorisvandenbossche commented Nov 13, 2018

TomAugspurger commented Nov 14, 2018

codecov bot commented Nov 14, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

jreback commented Nov 14, 2018

TomAugspurger commented Nov 14, 2018

jreback commented Nov 14, 2018

jbrockmendel commented Nov 12, 2018 •

edited

Loading

pep8speaks commented Nov 12, 2018 •

edited

Loading

TomAugspurger Nov 13, 2018 •

edited

Loading

codecov bot commented Nov 14, 2018 •

edited

Loading