Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add get_fill_value Variable method and fill_value='default' option #1375

Merged
merged 20 commits into from
Oct 22, 2024
Merged
42 changes: 38 additions & 4 deletions src/netCDF4/_netCDF4.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -4035,11 +4035,15 @@ behavior is similar to Fortran or Matlab, but different than numpy.
Ignored if `significant_digts` not specified. If 'BitRound' is used, then
`significant_digits` is interpreted as binary (not decimal) digits.

**`fill_value`**: If specified, the default netCDF `_FillValue` (the
**`fill_value`**: If specified, the default netCDF fill value (the
value that the variable gets filled with before any data is written to it)
is replaced with this value. If fill_value is set to `False`, then
the variable is not pre-filled. The default netCDF fill values can be found
in the dictionary `netCDF4.default_fillvals`.
is replaced with this value, and the `_FillValue` attribute is set.
If fill_value is set to `False`, then the variable is not pre-filled.
The default netCDF fill values can be found in the dictionary `netCDF4.default_fillvals`.
If not set, the default fill value will be used but no `_FillValue` attribute will be created
(this is the default behavior of the netcdf-c library). `Variable.get_fill_value`
can be used to retrieve the fill value, even if the `_FillValue` attribute is
not set.

**`chunk_cache`**: If specified, sets the chunk cache size for this variable.
Persists as long as Dataset is open. Use `set_var_chunk_cache` to
Expand Down Expand Up @@ -4638,6 +4642,36 @@ behavior is similar to Fortran or Matlab, but different than numpy.
return the group that this `Variable` is a member of."""
return self._grp

def get_fill_value(self):
"""
**`get_fill_value(self)`**

return the fill value associated with this `Variable` (None if data is not
pre-filled). Works even if default fill value was used, and `_FillValue` attribute
does not exist."""
cdef int ierr, no_fill
with nogil:
ierr = nc_inq_var_fill(self._grpid,self._varid,&no_fill,NULL)
_ensure_nc_success(ierr)
if no_fill == 1: # no filling for this variable
return None
else:
try:
fillval = self._FillValue
return fillval
Comment on lines +4668 to +4673
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure when no_fill would be one -- but if there is a _FillValue attribute, maybe it should be returned anyway? e.g. look for that first?

The other question is what to do if the _FillValue attribute doesn't match what nc_inq_var_fill returns?

That would be a malformed file, but maybe helpful to warn the user somehow?

Copy link
Collaborator Author

@jswhit jswhit Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if no_fill=1, there is no pre-filling of data in the variable (so _FillValue is not used). Not sure what happens if pre-filling is turned off and _FillValue is set - but in this case, I think the user would expect to get information on what is actually happening when you create a variable and don't write data to it.

Copy link
Contributor

@ChrisBarker-NOAA ChrisBarker-NOAA Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah -- that's the challenge -- the fill value has a well defined meaning and purpose, but it's commonly used (abused?) to means missing value, or invalid value. So someone could, in theory, write a file without the fill value set, and then use the attribute to mean missing data. so ????

But I suppose the pathological cases are not our problem :-) -- the point of this new method to get the actual, under the hood, fill_value.

in which case, looking for the FillValue attribute is unnecessary -- unless we want to check that it matches, which might be a good idea!

except AttributeError:
# _FillValue attribute not set, see if we can retrieve _FillValue.
# for primitive data types.
if self._isprimitive:
#return numpy.array(default_fillvals[self.dtype.str[1:]],self.dtype)
fillval = numpy.empty((),self.dtype)
ierr=nc_inq_var_fill(self._grpid,self._varid,&no_fill,PyArray_DATA(fillval))
_ensure_nc_success(ierr)
return fillval
else:
# no default filling for non-primitive data types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is where no_fill would be 1 -- so I'd think we would want to return the _FillValue attribute if it exists.

return None

def ncattrs(self):
"""
**`ncattrs(self)`**
Expand Down
38 changes: 38 additions & 0 deletions test/test_get_fill_value.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
import unittest, os, tempfile
import netCDF4
from numpy.testing import assert_array_equal
import numpy as np

fill_val = np.array(9.9e31)

# test Variable.get_fill_value

class TestGetFillValue(unittest.TestCase):
def setUp(self):
self.testfile = tempfile.NamedTemporaryFile(suffix='.nc', delete=False).name
f = netCDF4.Dataset(self.testfile, 'w')
dim = f.createDimension('x',10)
for dt in netCDF4.default_fillvals.keys():
if not dt.startswith('c'):
v = f.createVariable(dt+'_var',dt,dim)
v = f.createVariable('float_var',np.float64,dim,fill_value=fill_val)
f.close()

def tearDown(self):
os.remove(self.testfile)

def runTest(self):
f = netCDF4.Dataset(self.testfile, "r")
# no _FillValue set, test that default fill value returned
for dt in netCDF4.default_fillvals.keys():
if not dt.startswith('c'):
fillval = np.array(netCDF4.default_fillvals[dt])
if dt == 'S1': fillval = fillval.astype(dt)
v = f[dt+'_var']
assert_array_equal(fillval, v.get_fill_value())
# _FillValue attribute is set.
v = f['float_var']
assert_array_equal(fill_val, v.get_fill_value())

if __name__ == '__main__':
unittest.main()
Loading