Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DatetimeArray constructors #24567

Closed
TomAugspurger opened this issue Jan 2, 2019 · 3 comments
Closed

DatetimeArray constructors #24567

TomAugspurger opened this issue Jan 2, 2019 · 3 comments
Labels
API Design Closing Candidate May be closeable, needs more eyeballs Constructors Series/DataFrame/Index/pd.array Constructors Datetime Datetime data dtype Enhancement Needs Discussion Requires discussion from core team before further action Period Period data type Refactor Internal refactoring of code Timedelta Timedelta data type

Comments

@TomAugspurger
Copy link
Contributor

The constructor(s) for DatetimeArray are a bit messy right now, so let's step back a bit to lay out what we want out of them.

What do we want out of our init? I'd like the following constraints:

  1. data is never copied unless explicitly requested with copy=True. The values in data are never coerced. This means no lists (copy), and no ndarrays of values that can be coerced to datetime64[ns] (no object-dtype strings, Timestamps, etc.). We do allow unboxing data from a Series / Index / DatetimeArray, and we do allow viewing i8 data as M8[ns].
  2. The signature matches across all DTA classes: values, dtype, freq, copy
  3. It's fast. There are two wrinkles here
    a.) I didn't (and many users probably don't) appreciate the performance impact of passing freq= to DTI / DTA. (ballpark: 5x slower for creating). Everything else is relatively cheap to check, the most expensive thing is probably timezone normalization which I think is unavoidable.
    b.) Frequency inference. Right now it's disallowed. Should we allow it? Is this expensive?

If possible, I'd prefer to avoid defining DatetimeArray.__new__, for two main reasons

  1. Maintainability: defining __new__ complicates pickle, which makes for relatively difficult debugging sessions in the future
  2. Aesthetics: Python already has a way for initializing classes (__init__), so all else equal I'd prefer to use that instead of __new__ + _simple_new

Some concretish TODOs:

  1. Investigate validation-checking code between DatetimeArray.__init__ and sequence_to_dt64ns (checking user-provided freq / dtype / tz vs. those properties on DatetimeArray values)
  2. Implement freq validation (blocked by
    Bad freq invalidation in DatetimeIndex.where #24555 and maybe
    Refactor DatetimeArray._generate_range #24562)
  3. Standardize DatetimeArray._simple_new and the __init__. Right now _simple_new takes _simple_new(cls, values, freq=None, tz=None). Changing that tz to dtype should lets use share more code between TDA/DTA/PeriodArray.
@TomAugspurger TomAugspurger added Refactor Internal refactoring of code Datetime Datetime data dtype Timedelta Timedelta data type Period Period data type labels Jan 2, 2019
@jbrockmendel
Copy link
Member

Some items that might be worth including in the list:

  • consistency between DatetimeArray/DatetimeIndex/Timestamp, TimedeltaArray/TimedeltaIndex/Timedelta, PeriodArray/PeriodIndex/Period
  • public constructors should not allow users to construct invalid objects

@jbrockmendel jbrockmendel added the Constructors Series/DataFrame/Index/pd.array Constructors label Jul 23, 2019
@mroeschke mroeschke added Needs Discussion Requires discussion from core team before further action API Design Enhancement labels Apr 1, 2020
@jbrockmendel
Copy link
Member

Is this still needed/actionable?

@jbrockmendel jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label May 18, 2021
@mroeschke
Copy link
Member

Seems like discussion never took off here and DatetimeArray has changed in the meantime. If actionable, probably best if this is revived in a new issue with smaller discussion points. Going to close for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Closing Candidate May be closeable, needs more eyeballs Constructors Series/DataFrame/Index/pd.array Constructors Datetime Datetime data dtype Enhancement Needs Discussion Requires discussion from core team before further action Period Period data type Refactor Internal refactoring of code Timedelta Timedelta data type
Projects
None yet
Development

No branches or pull requests

3 participants