Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad freq invalidation in DatetimeIndex.where #24555

Open
TomAugspurger opened this issue Jan 2, 2019 · 5 comments
Open

Bad freq invalidation in DatetimeIndex.where #24555

TomAugspurger opened this issue Jan 2, 2019 · 5 comments
Labels
Bug Datetime Datetime data dtype freq retention User expects "freq" attribute to be preserved Frequency DateOffsets

Comments

@TomAugspurger
Copy link
Contributor

What's the expected output here?

In [16]: i = pd.date_range('20130101', periods=3, tz='US/Eastern')

In [17]: i2 = pd.Index([pd.NaT, pd.NaT] + i[2:].tolist())

In [18]: i.where(pd.notna(i2), i2)
Out[18]: DatetimeIndex(['NaT', 'NaT', '2013-01-03 00:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', freq='D')

The returned DatetimeIndex doesn't pass freq validation.

In [23]: result._eadata._validate_frequency(result, result.freq)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/sandbox/pandas-alt/pandas/core/arrays/datetimelike.py in _validate_frequency(cls, index, freq, **kwargs)
    863                                           periods=len(index), freq=freq,
--> 864                                           **kwargs)
    865             if not np.array_equal(index.asi8, on_freq.asi8):

~/sandbox/pandas-alt/pandas/core/arrays/datetimes.py in _generate_range(cls, start, end, periods, freq, tz, normalize, ambiguous, nonexistent, closed)
    299         if start is NaT or end is NaT:
--> 300             raise ValueError("Neither `start` nor `end` can be NaT")
    301

ValueError: Neither `start` nor `end` can be NaT

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-23-24fa3f452eb0> in <module>
----> 1 result._eadata._validate_frequency(result, result.freq)

~/sandbox/pandas-alt/pandas/core/arrays/datetimelike.py in _validate_frequency(cls, index, freq, **kwargs)
    877             raise ValueError('Inferred frequency {infer} from passed values '
    878                              'does not conform to passed frequency {passed}'
--> 879                              .format(infer=inferred, passed=freq.freqstr))
    880
    881     # monotonicity/uniqueness properties are called via frequencies.infer_freq,

ValueError: Inferred frequency None from passed values does not conform to passed frequency D

Should the freq be None?

@TomAugspurger TomAugspurger added Datetime Datetime data dtype Frequency DateOffsets labels Jan 2, 2019
@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Jan 2, 2019

Another one.

In [16]: idx = pd.date_range('2014-01-02', '2014-04-30', freq='M', tz='UTC')

In [17]: result = idx.tz_convert("US/Eastern")

In [18]: result
Out[18]:
DatetimeIndex(['2014-01-30 19:00:00-05:00', '2014-02-27 19:00:00-05:00',
               '2014-03-30 20:00:00-04:00', '2014-04-29 20:00:00-04:00'],
              dtype='datetime64[ns, US/Eastern]', freq='M')

In [19]: result._eadata._validate_frequency(result, result.freq)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/sandbox/pandas-alt/pandas/core/arrays/datetimelike.py in _validate_frequency(cls, index, freq, **kwargs)
    913             if not np.array_equal(index.asi8, on_freq.asi8):
--> 914                 raise ValueError
    915         except ValueError as e:

ValueError:

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-19-24fa3f452eb0> in <module>
----> 1 result._eadata._validate_frequency(result, result.freq)

~/sandbox/pandas-alt/pandas/core/arrays/datetimelike.py in _validate_frequency(cls, index, freq, **kwargs)
    925             raise ValueError('Inferred frequency {infer} from passed values '
    926                              'does not conform to passed frequency {passed}'
--> 927                              .format(infer=inferred, passed=freq.freqstr))
    928
    929     # monotonicity/uniqueness properties are called via frequencies.infer_freq,

ValueError: Inferred frequency None from passed values does not conform to passed frequency M

though, perhaps there's a bug in the freq validation around DST boundaries? But maybe not. Here's the range for US/Eastern

In [36]: pd.date_range('2014-01-02', '2014-04-30', freq='M', tz='US/Eastern')
Out[36]:
DatetimeIndex(['2014-01-31 00:00:00-05:00', '2014-02-28 00:00:00-05:00',
               '2014-03-31 00:00:00-04:00', '2014-04-30 00:00:00-04:00'],
              dtype='datetime64[ns, US/Eastern]', freq='M')

So should tz_convert invalidate the freq?

@TomAugspurger
Copy link
Contributor Author

One more. In this case we seem to generate an array from bdate_range that doesn't have a valid freq (not sure if the bug is in the generation or the freq validation, probably the validation).

START = pd.Timestamp(2009, 3, 13)
END1 = pd.Timestamp(2009, 3, 18)
END2 = pd.Timestamp(2009, 3, 19)

freq = 'CBH'
a = pd.bdate_range(START, END1, freq=freq, weekmask='Mon Wed Fri',
                   holidays=['2009-03-14'])
b = pd.bdate_range(START, END2, freq=freq, weekmask='Mon Wed Fri',
                   holidays=['2009-03-14'])

a._eadata._validate_frequency(a, a.freq)
b._eadata._validate_frequency(b, b.freq)

a validates fine, but b doesn't

In [44]: b._eadata._validate_frequency(b, b.freq)
    ...:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/sandbox/pandas-alt/pandas/core/arrays/datetimelike.py in _validate_frequency(cls, index, freq, **kwargs)
    913             if not np.array_equal(index.asi8, on_freq.asi8):
--> 914                 raise ValueError
    915         except ValueError as e:

ValueError:

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-44-2b6f5f040d09> in <module>
----> 1 b._eadata._validate_frequency(b, b.freq)

~/sandbox/pandas-alt/pandas/core/arrays/datetimelike.py in _validate_frequency(cls, index, freq, **kwargs)
    925             raise ValueError('Inferred frequency {infer} from passed values '
    926                              'does not conform to passed frequency {passed}'
--> 927                              .format(infer=inferred, passed=freq.freqstr))
    928
    929     # monotonicity/uniqueness properties are called via frequencies.infer_freq,

ValueError: Inferred frequency None from passed values does not conform to passed frequency CBH

In the freq validation for b we generate an on_freq with the wrong(?) number of periods

ipdb> len(on_freq)
16
ipdb> len(index)
24

@TomAugspurger
Copy link
Contributor Author

Do we have a policy on when an operation that might invalidate a freq should infer vs. just set it to None? For example, in DatetimeIndex.where we could either do _shallow_copy(freq=None) or _shallow_copy_with_infer.

@TomAugspurger
Copy link
Contributor Author

I think that a fix for these issues (invalidating in places where needed, maybe fixing some bugs in the current freq validation) and a fix for #24562 will open up freq validation in DatetimeArray.__init__

@jbrockmendel
Copy link
Member

jbrockmendel commented Mar 12, 2020

I think [the OP example, not the others] was fixed by a semi-recent PR that implemented DTI/TDI.where and always sets the resulting freq to None.

@mroeschke mroeschke added the Bug label Apr 1, 2020
@jbrockmendel jbrockmendel added the freq retention User expects "freq" attribute to be preserved label Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype freq retention User expects "freq" attribute to be preserved Frequency DateOffsets
Projects
None yet
Development

No branches or pull requests

3 participants