Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: freq setter for DatetimeIndex/TimedeltaIndex/PeriodIndex is poorly supported #20678

Closed
jschendel opened this issue Apr 13, 2018 · 6 comments · Fixed by #20772
Closed

BUG: freq setter for DatetimeIndex/TimedeltaIndex/PeriodIndex is poorly supported #20678

jschendel opened this issue Apr 13, 2018 · 6 comments · Fixed by #20772
Labels
Datetime Datetime data dtype Frequency DateOffsets
Milestone

Comments

@jschendel
Copy link
Member

1. Setting to a frequency string alias fails for all datetimelike indexes (same error for all three):

In [1]: import pandas as pd

In [2]: dti = pd.DatetimeIndex(['20170101', '20170102', '20170103'])

In [3]: dti
Out[3]: DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03'], dtype='datetime64[ns]', freq=None)

In [4]: dti.freq = 'D'

In [5]: dti
Out[5]: ---------------------------------------------------------------------------
AttributeError: 'str' object has no attribute 'freqstr'

2. DatetimeIndex and TimedeltaIndex allow setting to invalid frequencies:

In [6]: dti = pd.DatetimeIndex(['20170101', '20170102', '20170103'])

In [7]: dti.freq = pd.offsets.Day(2)

In [8]: dti
Out[8]: DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03'], dtype='datetime64[ns]', freq='2D')

Note that trying to construct this from scratch fails, as expected:

In [9]: dti = pd.DatetimeIndex(['20170101', '20170102', '20170103'], freq='2D')
---------------------------------------------------------------------------
ValueError: Inferred frequency D from passed dates does not conform to passed frequency 2D

3. PeriodIndex gives nonsensical output when setting with an offset:

In [10]: pi = pd.PeriodIndex(['2018Q1', '2018Q2', '2018Q3'], freq='Q')

In [11]: pi
Out[11]: PeriodIndex(['2018Q1', '2018Q2', '2018Q3'], dtype='period[Q-DEC]', freq='Q-DEC')

In [12]: pi.freq = pd.offsets.Day()

In [13]: pi
Out[13]: PeriodIndex(['1970-07-12', '1970-07-13', '1970-07-14'], dtype='period[Q-DEC]', freq='D')

In addition to the nonsensical dates, note the incompatibility in [13] between dtype and freq.

Expected Output

I don't know that setting freq for a PeriodIndex should be supported: there's some ambiguity in regards aligning to the start/end of a period, and there's also the PeriodIndex.asfreq method that gives more control over this.

If setting freq is disallowed for PeriodIndex, should setting also be disallowed for DatetimeIndex and TimedeltaIndex in order to remain consistent? If so, should there then be an analogous asfreq for these two?

My thoughts on the three issues:

  1. String aliases should be coerced as appropriate (be it via setting or a asfreq method).

  2. This should raise with a similar message as the constructor (be it via setting or a asfreq method).

  3. This should raise, potentially with a message indicating to use asfreq. Or if we choose to allow setting, this should be consistent with the default asfreq options.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: fa231e8
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.0.dev0+740.gfa231e8
pytest: 3.1.2
pip: 9.0.1
setuptools: 39.0.1
Cython: 0.25.2
numpy: 1.13.3
scipy: 1.0.0
pyarrow: 0.6.0
xarray: 0.9.6
IPython: 6.1.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.0.2
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 0.9.8
lxml: 3.8.0
bs4: None
html5lib: 0.999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: 0.1.0
pandas_gbq: None
pandas_datareader: None

@jorisvandenbossche jorisvandenbossche added the Datetime Datetime data dtype label Apr 13, 2018
@jorisvandenbossche
Copy link
Member

Thanks, I thought we already had a similar issue, but can't find it.

Additionally, this even fails:

dti.freq = dti.inferred_freq

because inferred_freq returns a string (#5082)

  1. String aliases should be coerced as appropriate (be it via setting or a asfreq method).
    This should raise with a similar message as the constructor (be it via setting or a asfreq method).

+1

Regarding adding a asfreq to DatetimeIndex, or improving the setting via freq: such a potential DatetimeIndex.asfreq is somewhat different as PeriodIndex.asfreq. For periods it really converts the underlying integer values, while for datetimes, it is just setting an attribute (and checking the passed freq matches the values).
Therefore, I would rather go improving dti.freq = ... to parse the passed string and raise a informative error message if needed (when the freq does not match the data).

@jreback
Copy link
Contributor

jreback commented Apr 13, 2018

I don't think there is an actual case for setting this (though don't completely know). Should deprecate setting and raise and exception. (and of course Indexes are immutable), though we allow setting of 'name' (e.g. meta-data), but this is not just descriptive, but can affect ops.

Rather using .asfreq() is more appropriate here

@jschendel
Copy link
Member Author

Rather using .asfreq() is more appropriate here

Would creating a new .setfreq (or set_freq) method for DatetimeIndex and TimedeltaIndex potentially be more appropriate here? As @jorisvandenbossche pointed out, the behavior of a potential asfreq on DatetimeIndex and TimedeltaIndex would be a bit different than PeriodIndex. Not sure how the potential for confusion between methods balances out with the API bloat of adding a new method though.

@jorisvandenbossche
Copy link
Member

I don't think there is an actual case for setting this (though don't completely know)

I don't really know as well. But in principle, you can create a regular DTI from values, and then it has no freq but it has an inferred freq. And in that case you can set the freq manually (just as you can do with the constructor), to avoid that it is always inferred?

Would creating a new .setfreq (or set_freq) method for DatetimeIndex and TimedeltaIndex potentially be more appropriate here?

Given the limited use case, I don't think it is worth adding a new method for that (+ that I generally don't like set_ methods if we can just have the actual property settable, and the method does not give any additional functionality).
But also as I said, asfreq is different here, so I would not re-use that.
I really don't see the problem in just fixing up setting dti.freq = .. directly to setting the freq correctly if possible or by raising an informative error message.

@TomAugspurger
Copy link
Contributor

What's the status on this? 0.23 or pushing?

@jschendel
Copy link
Member Author

jschendel commented Apr 27, 2018

@TomAugspurger : FWIW I plan to push changes that address the review comments later tonight. Not sure how much discussion/additional changes will be required beyond that though. I'll have time this weekend to do work related to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Frequency DateOffsets
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants