-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need better user control of _FillValue attribute in NetCDF files #1598
Comments
There are at least two ways to fix this:
|
There is also the philosophical problem of fill values for coordinate variables. To be true to reality, one really would want to add an interpolated value that fills whatever gap or bad value exists. That seems to be out of the scope of xarray though. I'm fine with a flag that controls only the coordinate data. That said, for the rest of the variables, we avoid NaN in _FillValue. We use 1E35. So there you could give the user a choice in default fill value. It seems pythonic to give the user flexibility. And the minute you satisfy us, there will be another use case that comes along with conflicting requirements. So you could use a flag and make it the user's choice, and not xarray's concern. It also depends on where in the process one cleans up one's data - reduce first, then QA/QC, or QA/QC first, then reduce. We do both, it depends on the instrument. |
Indeed, this is prohibited by CF conventions -- but xarray (like pandas) takes a more flexible approach here, allowing for missing values for all variables. You can already specify an explicit choice for (There is no need worry about |
I actually think we should use |
Agreed, None is probably better. There is no such thing as a "null" dtype.
…On Thu, Sep 28, 2017 at 1:10 PM Joe Hamman ***@***.***> wrote:
I actually think we should use None as the _FillValue sentinel value. We
do (sort of) support boolean arrays (#849
<#849>).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1598 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1pR8YDZ9-Sw_cm4ckTI4XAV45UlOks5sm_0mgaJpZM4Pnox9>
.
|
Correct me if you're talking about something different, but xarray already supports setting
Which, for the relevant dimension, yields in ncinfo:
If I comment out that line in my processing routine, I get the following:
I agree that changing from |
@dnowacki-usgs - you've made a good point. At least for the netCDF4 backend, this seems to work out of the box with None/False. Can someone check that this works for the scipy/h5netcdf backends? |
@jhamman In brief, it's weird.
So, this is some peculiar behavior. Setting Code below:
netCDF4
scipy
h5netcdf
|
It sounds like we should control this in xarray to ensure consistent behavior. |
First condition: unset _FillValue attribute for all independent variables (coordinates and their bounds) as per CF convention but contrary to xarray default; see pydata/xarray#1598. Second condition: 'NaN' not a valid _FillValue in NCL for any variable; see https://www.ncl.ucar.edu/Support/talk_archives/2012/1689.html
This issue is under discussion here: #1165
It is not desirable for us to have _FillValue = NaN for dimensions and coordinate variables.
In trying to use xarray, _FillValue was carefully kept from these variables and dimensions during the creation of the un-resampled file and then were found to appear during the to_netcdf operation. This happens in spite of mask_and_scale=False is being used with xr.open_dataset
I would hope that downstream code would have trouble with coordinates that don't make logical sense (time or place being NaN, for instance). We would prefer NOT to instantiate coordinate variable data with any fill value. Keeping NaNs out of coordinate variables, dimensions and minima and maxima is part of our QA/QC process to avoid downstream issues.
The text was updated successfully, but these errors were encountered: