Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: preserve freq in DTI/TDI.factorize #38120

Merged
merged 20 commits into from
Nov 30, 2020

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Nov 27, 2020

cc @simonjayhawkins
closes #35563

this is an old branch I resurrected and rebased, so has some excess cruft that would need to be removed if we want to move forward.

LMK if there are any tests you'd like to see added based on the other issue (or just go ahead and push them if you like)

@simonjayhawkins
Copy link
Member

thanks @jbrockmendel I'll look in detail tomorrow and add tests as required.

@simonjayhawkins simonjayhawkins added this to the 1.1.5 milestone Nov 27, 2020
Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we need to add special casing since extension array interface doesn't support the sort kwarg, xref #33838

pandas/tests/indexes/datetimes/test_datetime.py Outdated Show resolved Hide resolved
pandas/tests/indexes/timedeltas/test_timedelta.py Outdated Show resolved Hide resolved
@jreback
Copy link
Contributor

jreback commented Nov 28, 2020

this is going to be hard to backport

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can get this passing on the backport as well ok to merge, but pls put up a separate patch for that to see if its even possible.

if is_extension_array_dtype(dtype):
values = dtype.construct_array_type()._from_sequence(values)
cls = dtype.construct_array_type()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elsif here (why are using isintance when we should be using is_extension_array_dtype)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the elif part of the comment, I think that Brock maybe keeping the early return separate for readability

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elsif here

sure

why are using isintance when we should be using is_extension_array_dtype

  1. perf, is_foo_dtype is slow
  2. to catch DTA/TDA

(... i think. most of this is a resurrected branch from a ways back)

@jreback jreback added API - Consistency Internal Consistency of API/Behavior Datetime Datetime data dtype Frequency DateOffsets MultiIndex labels Nov 28, 2020
@simonjayhawkins
Copy link
Member

if you can get this passing on the backport as well ok to merge, but pls put up a separate patch for that to see if its even possible.

sure

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking as will need a release note before merge, i'll add if #38137 successful

@@ -1645,6 +1645,24 @@ def _with_freq(self, freq):
arr._freq = freq
return arr

# --------------------------------------------------------------

def factorize(self, na_sentinel=-1, sort: bool = False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_values_for_factorize and _from_factorized in DatetimeLikeArrayMixin are now only used by PeriodArray? should these be moved?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

values_for_factorize is still used (incorrectly) in core.util.hashing, so for now this is still needed here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@simonjayhawkins
Copy link
Member

this is going to be hard to backport

maybe doable, in #38137, the ci has not completed yet but looks like tests are passing (except for np-dev which is unrelated and failing on other PRs)

@jreback
Copy link
Contributor

jreback commented Nov 29, 2020

yeah ok with just making this work here, can fixup later / cleanup

@simonjayhawkins
Copy link
Member

@jreback it would be good to get this into master before 1.2.0rc0 if it's ready. not critical but would only need one backport. backporting to 1.1.5 does not block 1.2.0rc0

if more discussion needed, fine. no need to be under any pressure to merge.

@jreback
Copy link
Contributor

jreback commented Nov 30, 2020

@simonjayhawkins no i think this is fine. I am sure that @jbrockmendel will have a cleanup on top of this, but good for now. thanks.

@jreback jreback merged commit c29c176 into pandas-dev:master Nov 30, 2020
@lumberbot-app

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Datetime Datetime data dtype Frequency DateOffsets MultiIndex
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: freq set to NONE when resampling pd.Multiindex (introduced in v1.1.0)
3 participants