-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use cftime for temporal averaging operations #302
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #302 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 14 14
Lines 1169 1170 +1
=========================================
+ Hits 1169 1170 +1 ☔ View full report in Codecov by Sentry. |
0d1352c
to
197281d
Compare
Thanks for this PR. I already started taking a look, but will be able to review more closely next week. We should have this merged before #300. |
0581d14
to
d3e1dd4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# NOTE: With `decode_times=True`, the "calendar" and "units" attributes are | ||
# stored in `.encoding`. | ||
time_cf.encoding["calendar"] = "standard" | ||
time_cf.encoding["units"] = "days since 2000-01-01" | ||
|
||
|
||
# NOTE: With `decode_times=False`, the "calendar" and "units" attributes are | ||
# stored in `.attrs`. | ||
time_non_cf = xr.DataArray( | ||
data=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], | ||
dims=["time"], | ||
attrs={ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just bringing this to attention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense. This is a separate issue when I open a dataset with xarray.open_mfdataset
and decode_times=True
, I don't see anything populated in ds.time.encoding
(but I do see calendar and units information stored here when using xarray.open_dataset
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! I found an open xarray issue related to xr.open_mfdataset()
dropping encoding attributes.
pydata/xarray#2436
pydata/xarray#2436 (comment)
Somebody mentions this about the issue here roocs/clisops#88 (comment)
However, there is another problem related to xarray.open_mfdataset:
The encoding dictionary gets lost somewhere during the merging operation of the datasets of the respective files (pydata/xarray#2436).This leads to problems for example with cf-xarray when trying to detect coordinates or bounds, but also leads to problems related to the time axis encoding apparently (as seen in the linked issue). I managed at least to avoid the problems for cf-xarray bounds and coordinates detection by using the decode functionality of xarray only after the datasets have been read in (leaving however the unnecessary time dimension in place ...):
ds = xarray.open_mfdataset("/path/to/files/*.nc")
ds = xarray.decode_cf(ds, decode_coords="all")
I think we might need to use the preprocess
function to save the .encoding
dict, or some other workaround from one of the comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can open up a separate issue for this. I'm glad you're finding all of these nonobvious behaviors/bugs in xarray and xCDAT.
cftime.datetime(2001, 2, 1), | ||
cftime.datetime(2001, 3, 1), | ||
cftime.DatetimeGregorian( | ||
2000, 1, 1, 0, 0, 0, 0, has_year_zero=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the cftime.datetime
objects in our tests are now updated to cftime.<OBJECT_TYPE>
.
xcdat/dataset.py
Outdated
time_attrs = time.attrs | ||
|
||
# NOTE: When opening datasets with `decode_times=False`, the "calendar" and | ||
# "units" attributes are stored in `.encoding` (unlike `decode_times=True` | ||
# which stores them in `.attrs`). Since xCDAT manually decodes non-CF | ||
# compliant time coordinates by first setting `decode_times=False`, the | ||
# "calendar" and "units" attrs are popped from the `.attrs` dict and stored | ||
# in the `.encoding` dict to mimic xarray's behavior. | ||
calendar = time_attrs.pop("calendar", None) | ||
units_attr = time_attrs.pop("units", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In decode_non_cf_time()
, the "calendar" and "units" attributes were being saved in both the .attrs
and .encoding
dicts.
However, they should only be in the .encoding
dict after decoding time coordinates. To achieve this behavior, we pop these attrs from the .attrs
dict and later add them to .encoding
.
try: | ||
self.calendar = self._dataset[self._dim].encoding["calendar"] | ||
self.date_type = get_date_type(self.calendar) | ||
except KeyError: | ||
raise KeyError( | ||
"This dataset's time coordinates do not have a 'calendar' encoding " | ||
"attribute set, which might indicate that the time coordinates were not " | ||
"decoded to datetime objects. Ensure that the time coordinates are " | ||
"decoded before performing temporal averaging operations." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now consider the "calendar" attribute and the associated cftime
date type object for temporal averaging operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should "were not decoded to datetime objects" be "were not decoded to cftime.datetime objects"? Or are they sufficiently interchangeable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Temporal averaging accepts both datetime.datetime
and cftime.datetime
objects, but the end result will always be cftime.datetime
.
I think it is sufficiently interchangeable to reference both as just datetime
.
@tomvothecoder - I had a couple minor questions, but this looks good to me. |
- Update format of `decode_non_cf_time()` conditional statement - Update `time_cf` with encoding dict attributes (xarray behavior) - Update `test_dataset.py` and `test_temporal.py` with specific `cftime` date_type objects - Add `xesmf` to `dev.yml` - Remove `Jupyter` setting from VS Code workspace
c4ecd06
to
5665c82
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description
This is an attempted fix for #301. The PR uses cftime.datetime objects in the
_convert_df_to_dt
function (instead of datetime / cfdatetime objects depending on the dataset attributes).Checklist
If applicable: