Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: OutOfBoundsTimedelta not raised for out-of-bounds Timedelta #36615

Closed
2 of 3 tasks
spencerkclark opened this issue Sep 25, 2020 · 4 comments · Fixed by #42235
Closed
2 of 3 tasks

BUG: OutOfBoundsTimedelta not raised for out-of-bounds Timedelta #36615

spencerkclark opened this issue Sep 25, 2020 · 4 comments · Fixed by #42235
Labels
Bug Error Reporting Incorrect or improved errors from pandas good first issue Needs Tests Unit test(s) needed to prevent regressions Testing pandas testing functions or related to the test suite Timedelta Timedelta data type
Milestone

Comments

@spencerkclark
Copy link
Contributor

spencerkclark commented Sep 25, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: pd.Timedelta(np.timedelta64(200000, "D"))
Out[3]: Timedelta('-13504 days +00:25:26.290448384')

Problem description

Pandas provides a useful error message if one tries to create a Timestamp using a datetime that is out of bounds for nanosecond-precision times:

In [4]: pd.Timestamp(np.datetime64("0001-01-01", "us"))
---------------------------------------------------------------------------
OutOfBoundsDatetime                       Traceback (most recent call last)
<ipython-input-11-1bf9f0c9d954> in <module>
----> 1 pd.Timestamp(np.datetime64("0001-01-01", "us"))

~/Software/pandas/pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.__new__()

~/Software/pandas/pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()

~/Software/pandas/pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.get_datetime64_nanos()

~/Software/pandas/pandas/_libs/tslibs/np_datetime.pyx in pandas._libs.tslibs.np_datetime.check_dts_bounds()

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 00:00:00

I was excited to see an exception along these lines added for out-of-bounds timedeltas, OutOfBoundsTimedelta:
#34448; however, it does not seem to be very easy to trigger. Is the idea to eventually check this in more places?

Expected Output

I might expect this to raise the newly added OutOfBoundsTimedelta error.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 2a7d332
python : 3.7.3.final.0
python-bits : 64
OS : Darwin
OS-release : 19.5.0
Version : Darwin Kernel Version 19.5.0: Tue May 26 20:41:44 PDT 2020; root:xnu-6153.121.2~2/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.2
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 19.2.2
setuptools : 49.6.0.post20200917
Cython : 0.29.19
pytest : 5.0.1
hypothesis : 5.6.0
sphinx : 3.0.4
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.10.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.2.1
fsspec : 0.8.2
fastparquet : None
gcsfs : 0.6.2+6.geca6dce
matplotlib : 3.3.0rc1.post439+g7e9530338
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.2.1
sqlalchemy : None
tables : None
tabulate : None
xarray : 0.16.1
xlrd : None
xlwt : None
numba : 0.51.2

@spencerkclark spencerkclark added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 25, 2020
@mzeitlin11 mzeitlin11 added Error Reporting Incorrect or improved errors from pandas Timedelta Timedelta data type and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 23, 2020
@mzeitlin11 mzeitlin11 added this to the Contributions Welcome milestone Dec 23, 2020
@spencerkclark
Copy link
Contributor Author

I think this was resolved by #40008 -- thanks @jbrockmendel! One minor side-effect of the changes there is that the following now raises an error:

>>> import numpy as np; import pandas as pd
>>> pd.Series(np.array(["NaT"], dtype="M8"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/spencer/software/pandas/pandas/core/series.py", line 442, in __init__
    data = sanitize_array(data, index, dtype, copy)
  File "/Users/spencer/software/pandas/pandas/core/construction.py", line 545, in sanitize_array
    subarr = _try_cast(data, dtype, copy, raise_cast_failure)
  File "/Users/spencer/software/pandas/pandas/core/construction.py", line 702, in _try_cast
    return sanitize_to_nanoseconds(arr, copy=copy)
  File "/Users/spencer/software/pandas/pandas/core/dtypes/cast.py", line 1710, in sanitize_to_nanoseconds
    values = conversion.ensure_datetime64ns(values)
  File "pandas/_libs/tslibs/conversion.pyx", line 245, in pandas._libs.tslibs.conversion.ensure_datetime64ns
ValueError: datetime64/timedelta64 must have a unit specified

In the past it would automatically cast the "NaT" value to nanosecond precision. We were unintentionally relying on this behavior in one of the xarray tests. It would be straightforward to work around -- I think all that would be required is that we also specify units in the array constructor, e.g. pd.Series(np.array(["NaT"], dtype="M8[ns]")) -- but I was just curious if this side-effect was intended.

xref: pydata/xarray#5366

@jreback
Copy link
Contributor

jreback commented Jun 12, 2021

units are explicit in M8 types so this is correct

@spencerkclark
Copy link
Contributor Author

Sounds good -- we'll fix this in xarray then -- thanks @jreback.

@jreback
Copy link
Contributor

jreback commented Jun 12, 2021

actually a test for this would be great

@jreback jreback added good first issue Testing pandas testing functions or related to the test suite labels Jun 12, 2021
@jbrockmendel jbrockmendel added the Needs Tests Unit test(s) needed to prevent regressions label Jun 12, 2021
feefladder pushed a commit to feefladder/pandas that referenced this issue Jun 25, 2021
feefladder pushed a commit to feefladder/pandas that referenced this issue Jun 25, 2021
@jreback jreback modified the milestones: Contributions Welcome, 1.4 Jun 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas good first issue Needs Tests Unit test(s) needed to prevent regressions Testing pandas testing functions or related to the test suite Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants