Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limitation: Unable to store Year 4000 timesteps in a pandas DataFrame. #16

Closed
jshannon-usbr opened this issue Feb 11, 2020 · 4 comments
Closed

Comments

@jshannon-usbr
Copy link

  • pyhecdss version: 0.2.9
  • Python version: 3.7.5
  • Operating System: Windows 10

Description

Querying land use data from CalSimHydro input DSS file CS3_LandUseDU.dss reveals a timestep limitation in pandas. CalSimHydro and IDC use the Year 4000 to indicate a repeating time series, but the bounds of pandas.Timestamp does not include that year. FYI, Python's datetime standard library is able to handle the Year 4000

>>> import datetime
>>> datetime.datetime(4000, 1, 31).isoformat()
'4000-01-31T00:00:00'

What I Did

>>> d = pyhecdss.DSSFile('CS3_LandUseDU.dss')
>>> cat = d.read_catalog()
>>> plist = d.get_pathnames(cat)
>>> v1 = plist[0]; v1
'/CALSIM/02_NA_AL/LANDUSE/01JAN4000/1MON/EXISTING/'
>>> df, _, _ = d.read_rts(v1)
Traceback (most recent call last):
  File "C:\Users\jshannon\AppData\Roaming\Continuum\anaconda3\envs\test_DWR\lib\site-packages\pandas\core\arrays\datetimes.py", line 1979, in objects_to_datetime64ns
    values, tz_parsed = conversion.datetime_to_datetime64(data)
  File "pandas/_libs/tslibs/conversion.pyx", line 200, in pandas._libs.tslibs.conversion.datetime_to_datetime64
TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "C:\Users\jshannon\AppData\Roaming\Continuum\anaconda3\envs\test_DWR\lib\site-packages\pyhecdss\pyhecdss.py", line 351, in read_rts
    endDateStr, interval)
  File "C:\Users\jshannon\AppData\Roaming\Continuum\anaconda3\envs\test_DWR\lib\site-packages\pyhecdss\pyhecdss.py", line 274, in _pad_to_end_of_block
    return (pd.to_datetime(endDateStr) + buffer).strftime('%d%b%Y').upper()
  File "C:\Users\jshannon\AppData\Roaming\Continuum\anaconda3\envs\test_DWR\lib\site-packages\pandas\util\_decorators.py", line 208, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\jshannon\AppData\Roaming\Continuum\anaconda3\envs\test_DWR\lib\site-packages\pandas\core\tools\datetimes.py", line 796, in to_datetime
    result = convert_listlike(np.array([arg]), box, format)[0]
  File "C:\Users\jshannon\AppData\Roaming\Continuum\anaconda3\envs\test_DWR\lib\site-packages\pandas\core\tools\datetimes.py", line 463, in _convert_listlike_datetimes
    allow_object=True,
  File "C:\Users\jshannon\AppData\Roaming\Continuum\anaconda3\envs\test_DWR\lib\site-packages\pandas\core\arrays\datetimes.py", line 1984, in objects_to_datetime64ns
    raise e
  File "C:\Users\jshannon\AppData\Roaming\Continuum\anaconda3\envs\test_DWR\lib\site-packages\pandas\core\arrays\datetimes.py", line 1975, in objects_to_datetime64ns
    require_iso8601=require_iso8601,
  File "pandas/_libs/tslib.pyx", line 465, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 683, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 679, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 633, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslibs/conversion.pyx", line 399, in pandas._libs.tslibs.conversion.convert_datetime_to_tsobject
  File "pandas/_libs/tslibs/np_datetime.pyx", line 118, in pandas._libs.tslibs.np_datetime.check_dts_bounds
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 4000-01-01 00:00:00
@dwr-psandhu
Copy link
Collaborator

Looks like an open issue in pandas pandas-dev/pandas#28104

In the meantime I will look for a workaround. Perhaps a workaround would be to return a pandas data frame indexed by Periods as suggested in pandas FAQs https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#representing-out-of-bounds-spans

@jshannon-usbr If you have a solution, I am open to a pull request

@dwr-psandhu
Copy link
Collaborator

Numpy supports the idea of units https://docs.scipy.org/doc/numpy/reference/arrays.datetime.html#datetime-units
Pandas has an open request that is not going anywhere (stalemate) pandas-dev/pandas#7307
Based on this, a workaround would be to read the data into pandas data frame for year ranges and then also provide a function to read into pandas data frame with time as numpy datetimes. One would lose support for time slicing and range selection. Those can be regained by the user shifting back into pandas time limitation ranges.

@dwr-psandhu
Copy link
Collaborator

@jshannon-usbr update pyhecdss > 0.4.0 and test again. The fix is currently passing tests and reading the dss file in question.

@jshannon415
Copy link

@dwr-psandhu This is @jshannon-usbr responding from my personal account. I was finally able to test this scenario on my end and can confirm it is working. For users who stumble upon this situation, here is the resolved console experience:

>>> import pyhecdss
>>> d = pyhecdss.DSSFile('CalSimHydro/inputDSS/CS3_LandUseDU.dss')

    -----DSS---ZOPEN:  Existing File Opened,  File: CalSimHydro/inputDSS/CS3_LandUseDU.dss
                       Unit:   71;  DSS Versions - Software: 6-VE, File: 6-KC
>>> cat = d.read_catalog()
>>> plist = d.get_pathnames(cat)
>>> v1 = plist[0]; v1
'/CALSIM/02_NA_AL/LANDUSE/01JAN4000/1MON/EXISTING/'
>>> df, _, _ = d.read_rts(v1)
 -----DSS*** ZRRTS:  CAUTION - Data block not found in file.  Unit:   71
 Pathname: /CALSIM/02_NA_AL/LANDUSE/01JAN3990/1MON/EXISTING/
 -----DSS--- ZREAD Unit   71; Vers.    1:  /CALSIM/02_NA_AL/LANDUSE/01JAN4000/1MON/EXISTING/
[redacted]\site-packages\pyhecdss\pyhecdss.py:317: RuntimeWarning: Some data or data blocks are missing [istat=3]
  "Some data or data blocks are missing [istat=" + str(istat) + "]", RuntimeWarning)
>>> df
         /CALSIM/02_NA_AL/LANDUSE/01JAN4000/1MON/EXISTING/
4000-01                                         167.927399
4000-02                                         167.927399
4000-03                                         167.927399
4000-04                                         167.927399
4000-05                                         167.927399
4000-06                                         167.927399
4000-07                                         167.927399
4000-08                                         167.927399
4000-09                                         167.927399
4000-10                                         167.927399
4000-11                                         167.927399
4000-12                                         167.927399

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants