Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Make 'D' & offsets.Day always operate as calendar day instead of 24 Hour #22864

Closed
mroeschke opened this issue Sep 27, 2018 · 15 comments
Closed

Comments

@mroeschke
Copy link
Member

From the dev-chat #22274 (comment) and a rehash of #20633

Conclusion:

  • Make the offset alias 'D' and offsets.Day always operate as a calendar day.

Technical implication:

  • Day will need to subclass DateOffset instead of Tick

Operations that will change behavior (may be missing some):

  1. Day arithmetic with Timestamp/DTI/datetime Series/DataFrame
  2. DatetimeIndex.shift
  3. Any use of 'D' with Timedelta/TDI/timedelta Series/DataFrame
  4. Tick arithmetic with Day

Deprecation Procedures (could use some input here)

  1. None
  2. None
  3. DeprecationWarning that users should use '24 H' instead of 'D' and allow users to use 'D' for v0.24.0
  4. None

cc @pandas-dev/pandas-core

@jorisvandenbossche
Copy link
Member

Didn't look yet into the WIP PR, but some thoughts:

  • Since we need to deprecate the current Day behaviour (for certain things like passing it to Timedelta, or arithmetic in case of DST), it may be easier to still keep the current implementation (but need to think a bit more in detail how it would go in practice)
  • We need an "update" path for users that want the new behaviour. Maybe a temporary keyword to the Day constructor to indicate it can be used as absolute? (ugly for sure, as we afterwards need to deprecate that again)

@jorisvandenbossche
Copy link
Member

So to actually better answer to your top post ..

Regarding deprecation: I think we need a deprecation for each of the 4 points you mention.

Eg case 1 (Day arithmetic with Timestamp) will break people's code if we simply change it. Of course, it will only change the result in a few cases (only when you have tz-ware data, and when there is a DST), so we might think about only raising a warning then.

@mroeschke
Copy link
Member Author

Just so I understand @jorisvandenbossche:

  • in v0.24.0: Everything works identically, but issue DeprecationWarnings for all operations that will change.
  • in some later version: Switch over to the new Day offset.

So in my PR, I can just prep the new Day offset class that is slated to replace the old one.

@jbrockmendel jbrockmendel added the Frequency DateOffsets label Sep 29, 2018
@jorisvandenbossche
Copy link
Member

Yes, that is how I would do it (but we can still discuss of course).
Only, ideally, users will have a way to already get the future behaviour without warning, so that might be the tricky issue

@jorisvandenbossche jorisvandenbossche added this to the 0.24.0 milestone Oct 1, 2018
@jorisvandenbossche jorisvandenbossche added the Blocker Blocking issue or pull request for an upcoming release label Oct 1, 2018
@mroeschke
Copy link
Member Author

@jreback @jorisvandenbossche

We can discuss this during the next dev chat, but I want to confirm the operations that warrant FutureWarnings with 'D' and Day converting to fully 1 calendar day:

Timedelta Related Deprecations

  • timedelta_range(..., freq='D')
  • to_timedelta(..., unit='D')
  • Timedelta(..., unit='D')
  • Timedelta/TimedeltaIndex/Series/DataFrame[timedelta64[ns]] __add__/__sub__/__div__ Day
  • Series/DataFrame[timedelta64[ns]].resample('D')
  • TimedeltaIndex.round/ceil/floor('D')
  • Series/DataFrame[timedelta64[ns]].astype('timedelta64[D]')?

Datetime Related Deprecations

  • Timestamp/DatetimeIndex/Series/DataFrame[datetime64[ns, tz]] __add__/__sub__ Day
  • DatetimeIndex.shift('D') with tz

Tick Related Deprecations

  • Day +/- [Hour/Minute/Second/...]

I've already tried to implement some of these warnings (like warning for timedelta_range(..., freq='D')), but it's already generating ~1000 warnings in the test suite.

@mroeschke
Copy link
Member Author

Conversely, if we keep the current behavior in master. We would need to add FutureWarnings warnings for

  • date_range(..., 'D')
  • Series/DataFrame[datetime64[ns, tz]].resample('D')

@mroeschke
Copy link
Member Author

Post dev discussion on 11/8/2018. My understanding of the conclusion:

  1. We still want 'D' to "play nicely" with timedeltas and timestamps so it can still be used as a flexible entry point.
  • With timestamps, 'D' means "calendar day" (23/24/25 hours depending on DST)
  • With timedeltas, 'D' means 24 hours
  1. Arithmetic with the Day offset when it operates like 24 hours should be deprecated affecting the following:
  • Arithmetic (addition, subtraction, division) with timedeltas would raise an error in the future
  • Arithmetic (addition, subtraction) with timestamps would act like calendar day in the future
  • Arithmetic with other Ticks (Hour, Minute, Second) would raise and error in the future.
  1. The migration path of Day offset would be from Tick (24 hours) to DateOffset ("calendar day")

Question here:

In [1]: tdi = pd.timedelta_range('1day', freq='D', periods=3)

In [2]: tdi
Out[2]: TimedeltaIndex(['1 days', '2 days', '3 days'], dtype='timedelta64[ns]', freq='D')

# If in the future `Day` = "calendar day", this wouldn't make sense.
In [3]: tdi.freq
Out[3]: <Day>

Given the above example, it may make sense to have both a CalendarDay and Day offset?

cc @jorisvandenbossche @jreback

@jbrockmendel
Copy link
Member

Given the above example, it may make sense to have both a CalendarDay and Day offset?

If we're talking reasonably long-term, I think we should get rid of the Tick classes entirely and use Timedelta in their place. Then only CalendarDay would be needed, and there wouldn't be any ambiguity.

@mroeschke
Copy link
Member Author

I'd be on board with that, but I suspect it might be a lot of effort? Just so I'm on the same page, would Day subclass Timedelta instead? Timedelta (or whatever mechanism) would need to still need to take on a lot of frequency mechanisms as well

@mroeschke
Copy link
Member Author

Moving this off the 0.24.0 milestone, this can be address in a future release. #24330 just removes the existing CalendarDay behavior.

@mroeschke mroeschke removed this from the 0.24.0 milestone Dec 18, 2018
@mroeschke mroeschke added API Design and removed Blocker Blocking issue or pull request for an upcoming release labels Dec 19, 2018
@jbrockmendel
Copy link
Member

@mroeschke ive totally lost track, where did we land on this?

@mroeschke
Copy link
Member Author

I think we wanted offsets.Day to mean calendar day instead of 24 hours. I just lost steam in addressing all the deprecations mentioned in the top post.

@jbrockmendel you mentioned:

If we're talking reasonably long-term, I think we should get rid of the Tick classes entirely and use Timedelta in their place

which would make this transition a bit easier I think.

@jbrockmendel
Copy link
Member

@mroeschke this came up on the call today. any chance youll have time for this in the not-too-distant future?

@mroeschke
Copy link
Member Author

Sure I can start chipping away at this again. I remember propagating all the warnings correctly will be the biggest challenge.

@mroeschke
Copy link
Member Author

Closing in favor of #41943

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants