Add units and description to output netcdf files #232

micahkim23 · 2017-11-12T02:52:30Z

Closes #201

spencerkclark

Thanks for starting on this @micahkim23. I have a few initial inline style comments as well as one general question to discuss with @spencerahill.

I think it might make things easier to test if you made these methods independent of the Calc class (i.e. methods that didn't take self as the first argument), passing units, description, and dtype_out_vert as additional arguments:

def add_units_and_description_attrs(data, units, description, dtype_out_vert):
    ...

@spencerahill perhaps we could consider moving these methods to the utils/io.py module? Or maybe create a new utils/metadata.py module? They seem general and simple enough that they don't need to be part of Calc.

spencerkclark · 2017-11-12T12:26:28Z

aospy/calc.py

@@ -767,3 +767,19 @@ def load(self, dtype_out_time, dtype_out_vert=False, region=False,
        if plot_units:
            data = self.var.to_plot_units(data, dtype_vert=dtype_out_vert)
        return data
+
+    def addAttrs(self, data):
+        if (isinstance(data, xr.DataArray)):


Remove the outer set of parentheses (e.g. only isinstance(data, xr.DataArray) is needed).

spencerkclark · 2017-11-12T12:30:13Z

aospy/calc.py

+            for name, da in data.data_vars.items():
+                self.addAttrsDataArray(da)
+
+    def addAttrsDataArray(self, data):


Similar to the above comment regarding function naming, I recommend calling this one add_units_and_description_attrs_da.

spencerkclark · 2017-11-12T12:30:53Z

aospy/calc.py

+            else:
+                data.attrs['units'] = self.var.units
+        if self.var.description != '':
+            data.attrs['description'] = self.var.description


nit: make sure to add a newline at the end of the file (that's what the red "no" symbol at the end of this line on GitHub signifies).

I'm actually inclined to go ahead and write the description to the file even if it's blank. That makes the logic simpler and also makes the netCDF files outputted by aospy more homogeneous. I.e. once this is merged, users can always rely on there being a 'description' attr.

spencerkclark · 2017-11-12T12:34:39Z

aospy/calc.py

+    def addAttrsDataArray(self, data):
+        if self.var.units != '':
+            if self.var.dtype_out_vert == 'vert_int':
+                data.attrs['units'] = '(vertical integral of {0}): {0} kg m^-2)'.format(self.var.units)


I think this line is longer than 79 characters; one way to break it up would be to define the string first in a separate variable, then format and assign to data.attrs['units'] in a separate line:

units = '(Vertical integral of {0}): {0} kg m^-2' data.attrs['units'] = units.format(self.var.units)

Similar to my comment below re: always adding a description even if empty, I think we should do the same thing for units.

One question, though, is what units to use for vertical integrals when the original units are empty. As written, we don't include them. My initial thought is to, in that case, do e.g. (Vertical integral of quantity with unspecified units)

spencerkclark · 2017-11-12T12:45:12Z

aospy/calc.py

@@ -767,3 +767,19 @@ def load(self, dtype_out_time, dtype_out_vert=False, region=False,
        if plot_units:
            data = self.var.to_plot_units(data, dtype_vert=dtype_out_vert)
        return data
+
+    def addAttrs(self, data):


Note that PEP8 suggests using function names that are all lowercase, with words separated by underscores. For this function I would recommend using something more descriptive like add_units_and_description_attrs.

micahkim23 · 2017-11-12T20:58:53Z

In terms of testing, should I test the individual functions themselves and also test that the output netcdf files are getting the attrs added?

spencerkclark · 2017-11-12T21:24:39Z

@micahkim23 exactly, I would do both. If you go the route of writing the functions to be independent of Calc, test the functions themselves thoroughly (i.e. go through all the options of dtype_out_vert etc.), and then just add a check within one of the existing tests of Calc computations (you don't need to do it for all of them) where you open a file that is created and check to make sure the appropriate attributes are added.

By existing tests of Calc I'm referring to one like this one:

aospy/aospy/test/test_calc_basic.py

Lines 32 to 39 in 1a39157

    
           def test_annual_mean(self): 
        
               calc_int = CalcInterface(intvl_out='ann', 
        
                                        dtype_out_time='av', 
        
                                        **self.test_params) 
        
               calc = Calc(calc_int) 
        
               calc.compute() 
        
               assert isfile(calc.path_out['av']) 
        
               assert isfile(calc.path_tar_out)

After the assert isfile statements I would open up the calc.path_out['av'] file and check the attributes were added to it.

It might also be good to do the same thing for a regional calculation like the following one (since the logic for saving those is slightly different):

aospy/aospy/test/test_calc_basic.py

Lines 86 to 94 in 1a39157

    
           def test_simple_reg_av(self): 
        
               calc_int = CalcInterface(intvl_out='ann', 
        
                                        dtype_out_time='reg.av', 
        
                                        region=[globe], 
        
                                        **self.test_params) 
        
               calc = Calc(calc_int) 
        
               calc.compute() 
        
               assert isfile(calc.path_out['reg.av']) 
        
               assert isfile(calc.path_tar_out)

spencerahill

I've added a few more things in addition to @spencerkclark's, all of which I agree with.

Also, I'd like to highlight this functionality in our docs beyond a What's New entry. Specifically, let's modify our Examples page to demonstrate it, e.g. in this section.

spencerahill · 2017-11-13T03:10:24Z

aospy/calc.py

@@ -601,8 +602,7 @@ def _save_files(self, data, dtype_out_time):
                reg_data = xr.open_dataset(path)
            except (EOFError, RuntimeError, IOError):
                reg_data = xr.Dataset()
-            # Add the new data to the dictionary or Dataset.
-            # Same method works for both.
+            # Add the new data to the Dataset.


Actually, this logic is now self-explanatory, so I'd just omit the comment altogether.

spencerahill · 2017-11-13T03:19:37Z

aospy/calc.py

+            else:
+                data.attrs['units'] = self.var.units
+        if self.var.description != '':
+            data.attrs['description'] = self.var.description


I'm actually inclined to go ahead and write the description to the file even if it's blank. That makes the logic simpler and also makes the netCDF files outputted by aospy more homogeneous. I.e. once this is merged, users can always rely on there being a 'description' attr.

spencerahill · 2017-11-13T03:25:50Z

aospy/calc.py

+    def addAttrsDataArray(self, data):
+        if self.var.units != '':
+            if self.var.dtype_out_vert == 'vert_int':
+                data.attrs['units'] = '(vertical integral of {0}): {0} kg m^-2)'.format(self.var.units)


Similar to my comment below re: always adding a description even if empty, I think we should do the same thing for units.

One question, though, is what units to use for vertical integrals when the original units are empty. As written, we don't include them. My initial thought is to, in that case, do e.g. (Vertical integral of quantity with unspecified units)

spencerahill · 2017-11-13T03:37:05Z

@spencerahill perhaps we could consider moving these methods to the utils/io.py module? Or maybe create a new utils/metadata.py module? They seem general and simple enough that they don't need to be part of Calc.

I agree that they shouldn't be Calc methods. But I don't think they make sense to be public API. For that reason, I would just define them as module-level functions within calc.py, all named with leading underscores to indicate that they should be considered private API. But I could be convinced otherwise.

micahkim23 · 2017-11-13T09:03:39Z

I still have to finish testing the functions and editing the Examples page.

spencerkclark · 2017-11-13T11:28:44Z

I agree that they shouldn't be Calc methods. But I don't think they make sense to be public API. For that reason, I would just define them as module-level functions within calc.py, all named with leading underscores to indicate that they should be considered private API.

My thinking was just that calc.py is pretty cluttered as it is, so it might be worth splitting these private API functions off somewhere else, but I'm totally fine going this route. We can refactor things later if we feel inclined.

For that reason, I would just define them as module-level functions within calc.py, all named with leading underscores to indicate that they should be considered private API.

@micahkim23 this still means that these methods should be defined outside of the Calc class. You'll see that this makes them easier to test. Creating test Calc objects requires a lot of boilerplate code. Here you'll just need to make some dummy DataArrays and Datasets.

spencerkclark · 2017-11-13T11:35:39Z

aospy/test/test_calc_basic.py

@@ -37,6 +39,10 @@ def test_annual_mean(self):
        calc.compute()
        assert isfile(calc.path_out['av'])
        assert isfile(calc.path_tar_out)
+        data = xr.open_mfdataset(calc.path_out['av'], decode_times=False)
+        for name, da in data.data_vars.items():
+            assert 'units' in da.attrs


Instead of testing just that the attribute was added, I recommend testing that the value of the attribute is what you expect. This will test both that the attribute is in the dictionary, and make sure the proper value was encoded simultaneously.

I think you can pick off the expected values of the units and description attributes from the Var object associated with the test:

expected_units = calc.var.units expected_description = calc.var.description

spencerkclark · 2017-11-13T11:39:58Z

aospy/test/test_calc_basic.py

@@ -37,6 +39,10 @@ def test_annual_mean(self):
        calc.compute()
        assert isfile(calc.path_out['av'])
        assert isfile(calc.path_tar_out)
+        data = xr.open_mfdataset(calc.path_out['av'], decode_times=False)


I think xr.open_dataset should be all that is needed here. I suspect decode_times=False is also not needed as well.

spencerahill

@micahkim23 we are almost there! This should be my last round of review.

A couple cleanup things, and also your new tests sparked me to come up with a way to clean-up all of the tests in TestCalcBasic.

spencerahill · 2017-11-14T16:13:12Z

aospy/calc.py

@@ -767,3 +768,23 @@ def load(self, dtype_out_time, dtype_out_vert=False, region=False,
        if plot_units:
            data = self.var.to_plot_units(data, dtype_vert=dtype_out_vert)
        return data
+
+def _add_units_and_description_attrs(data, units, description, dtype_out_vert):


Now that I see this, the function name seems clunky. How about _add_metadata_as_attrs` instead? That is less of a mouthful and also leaves the door open to us adding additional attrs within the function in the future, without renaming.

spencerahill · 2017-11-14T16:14:52Z

aospy/calc.py

+                                                dtype_out_vert)
+
+def _add_units_and_description_attrs_da(data, units, description,
+                                        dtype_out_vert):


Both of these helper functions should have a one-line docstring. (Since they're private API and pretty self explanatory, a full one listing parameters, return value, etc., isn't necessary.)

spencerahill · 2017-11-14T16:15:35Z

aospy/calc.py

+
+def _add_units_and_description_attrs_da(data, units, description,
+                                        dtype_out_vert):
+    units = units


This line doesn't do anything; it's assigning something to itself. So it can be removed.

spencerahill · 2017-11-14T16:19:01Z

aospy/test/test_calc_basic.py

@@ -143,5 +146,55 @@ def setUp(self):
            'dtype_out_vert': 'vert_int'
        }

+class TestCalcAttrs(unittest.TestCase):


Sorry if I didn't mention this before, but we are transitioning away from the class-based, unittest.Testcase style of tests in favor of py.test's more function-based tests. The latter requires less boilerplate and provides additional functionality.

So all you have to do here is delete this TestCalcAttrs class, and turn it's test_units_description_attrs method into a module-level function.

When you do that, I think it will flow better if you place it after your helper functions, rather than before.

spencerahill · 2017-11-14T16:21:20Z

aospy/test/test_calc_basic.py

+    _add_units_and_description_attrs(ds, units, description, dtype_out_vert)
+    assert expected_units == da.attrs['units']
+    assert expected_description == da.attrs['description']
+    for name, d_arr in ds.data_vars.items():


The convention is to use the name arr for xr.DataArrays.

spencerahill · 2017-11-14T16:22:40Z

aospy/test/test_calc_basic.py

+        assert expected_units == d_arr.attrs['units']
+        assert expected_description == d_arr.attrs['description']
+
+def _test_output_attrs(calc, file):


Similar to above comment, I would move this to above the TestCalcBasic definition, since it's used within that class.

spencerahill · 2017-11-14T16:31:19Z

aospy/test/test_calc_basic.py

@@ -37,6 +39,7 @@ def test_annual_mean(self):
        calc.compute()
        assert isfile(calc.path_out['av'])
        assert isfile(calc.path_tar_out)
+        _test_output_attrs(calc, 'av')


I would like to add this test to all of the cases. In fact, we should bundle this along with the two assert statements that are in all of these. E.g. define at the module level

def _test_files_and_attrs(calc, dtype_out): assert isfile(calc.path_out[dtype_out]) assert isfile(calc.path_tar_out) _test_output_attrs(calc, 'av') And then replace the two assert statements (and `_test_output_attrs` where you added it) in each of these tests with this single function call.

spencerahill · 2017-11-14T16:38:14Z

Woops, two more things:

If you didn't already, please merge the latest commits from develop (namely Fix AppVeyor 3.4 failing builds #233) so that we don't get the Appveyor 3.4 error.
I just noticed all the other test failures (both Travis and Appveyor); they suggest there's something awry with your tests. Let us know if you get stuck on that. I suspect those will show up on your local machine too, not just in CI, so that should enable you to debug quickly.

spencerkclark

Thanks @micahkim23 -- just a couple more things from me as well!

spencerkclark · 2017-11-14T20:39:32Z

aospy/test/test_calc_basic.py

@@ -143,5 +146,55 @@ def setUp(self):
            'dtype_out_vert': 'vert_int'
        }

+class TestCalcAttrs(unittest.TestCase):
+    def test_units_description_attrs(self):


Building off of @spencerahill's comment regarding using pytest, this is a classic use case for pytest.mark.parametrize, so let's make sure to use that too. See here for an example use in our code-base:

aospy/aospy/test/test_automate.py

Lines 144 to 150 in 2ef0b9a

@pytest.mark.parametrize(

('type_', 'expected'),

[(Var, [condensation_rain, convection_rain, precip, ps, sphum]),

(Proj, [example_proj])])

def test_get_all_objs_of_type(obj_lib, type_, expected):

actual = _get_all_objs_of_type(type_, obj_lib)

assert set(expected) == set(actual)

The benefit of using pytest.mark.parametrize is that it splits up this big group of tests (which if one fails the entire test function fails, and the remaining cases don't get run) into individual tests, each of which gets run separately. This gives us a more complete picture for debugging if something goes wrong.

that's cool. I'll use that.

spencerkclark · 2017-11-14T20:43:10Z

aospy/calc.py

@@ -587,6 +587,9 @@ def compute(self, write_to_tar=True):
        reduced = self._apply_all_time_reductions(full, monthly, eddy)
        logging.info("Writing desired gridded outputs to disk.")
        for dtype_time, data in reduced.items():
+            _add_units_and_description_attrs(data, self.var.units,


Maybe it's just me, but for some reason the fact that this function doesn't return anything (and just mutates the input xarray object) feels odd. It would feel more readable to me if this line were something along the lines of:

data = _add_units_and_description_attrs(data, self.var.units, self.var.description, self.dtype_out_vert)

So I might consider re-writing the functions such that this would be possible (i.e. add return statements where appropriate).

micahkim23 · 2017-11-15T05:05:53Z

weird, for the travis build, the error is:

IOError: [Errno -36] NetCDF: Invalid argument: '/home/travis/build/spencerahill/aospy/aospy/test/data/objects/test-files/example_proj/example_model/example_run/sphum/sphum.ann.reg.ts.vert_int.from_monthly_ts_sigma.example_model.example_run.0006.nc'

It has problems with xr.open_dataset

spencerkclark · 2017-11-15T12:42:19Z

Indeed this is a puzzling issue. Things seem to work fine in the Python 3.4 environment, which uses an earlier version of netCDF4 (1.2.7, as opposed to the most current 1.3.1, which is used in the other environments).

@micahkim23 which version of netCDF4 do you have installed locally on your computer? If it is not 1.3.1, could you try upgrading to the latest version to see if you can reproduce the issues on Travis on your own machine?

$ conda upgrade -c conda-forge netcdf4

spencerkclark · 2017-11-15T13:06:47Z

The Windows test failures for Python 3.4 on AppVeyor seem different:

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\projects\\aospy\\aospy\\test\\data\\objects\\test-files\\example_proj\\example_model\\example_run\\condensation_rain\\condensation_rain.ann.av.from_monthly_ts.example_model.example_run.0004-0006.nc'

To address those I might try closing the Dataset opened in _test_output_attrs at the end of the method:

data.close()

spencerahill · 2017-11-15T17:00:59Z

To address those I might try closing the Dataset opened in _test_output_attrs at the end of the method

Good call. Even better and more idiomatic would be to use a context manager, i.e. a with statement:

with xr.open_dataset(calc.path_out[dtype_out]) as data:
    ...

The file is then automatically closed once the context manager block is finished, even in the case of crashes.

spencerahill · 2017-11-15T17:02:29Z

@micahkim23 FYI I swear there aren't usually this many annoying problems with the CI! Usually they just work.

spencerahill · 2017-11-15T19:42:54Z

Micah and I confirmed offline that it is indeed the netCDF 1.3.1 version that is causing the problem; pinning to 1.2.7 worked (at least in the 3.6 environment).

So a viable solution that I can live with is to pin it in all of the environments. @spencerkclark does that sound alright to you?

spencerahill · 2017-11-15T19:45:40Z

We also confirmed that the with statement version resolved the Appveyor fix (see https://ci.appveyor.com/project/spencerahill/aospy/build/1.0.384).

spencerkclark · 2017-11-15T23:00:36Z

Micah and I confirmed offline that it is indeed the netCDF 1.3.1 version that is causing the problem; pinning to 1.2.7 worked (at least in the 3.6 environment).

Hmm...I tried reproducing the issue locally, but I wasn't able to. What platform are you trying this on? I've tried on both my Mac and GFDL's linux systems.

Locally, is there any kind of debugging you can do to determine the root of the issue? In looking at prior issues citing this error message (e.g. Unidata/netcdf4-python#143) it seems a common thread is how the filepath is encoded. There was a recent PR in netcdf4-python that touched this part of their code base Unidata/netcdf4-python#693 (and as far as I can tell, version 1.3.1 is the only version on conda-forge that contains this change). Maybe what we're seeing here is related to those changes?

(I guess I'm a little hesitant to pin netCDF4 in our test environments across the board, because it is a pretty fundamental library to aospy).

spencerahill · 2017-11-16T00:17:08Z

What platform are you trying this on?

This was on my Mac. Note that the errors are occurring only on the new tests by @micahkim23. See this branch: https://github.com/spencerahill/aospy/tree/micahkim23-attrs, and in particular its Travis log. We pinned netcdf to 1.2.7 only in the 3.6 environment, and the tests pass there but not in 3.5 where it's still 1.3.1.

a common thread is how the filepath is encoded.

Good point. Also, notice the 'b' prefix to the strings in all of these failures, which indicates a bytes literal:

b'/home/travis/build/spencerahill/aospy/aospy/test/data/objects/test-files/example_proj/example_model/example_run/condensation_rain/condensation_rain.ann.ts.from_monthly_ts.example_model.example_run.0004-0006.nc'

Is that related?

I guess I'm a little hesitant to pin netCDF4 in our test environments across the board, because it is a pretty fundamental library to aospy

This is a totally valid point. Worth digging into a little more in order to avoid pinning if possible.

spencerahill · 2017-11-16T00:18:22Z

FYI re: the with statement, I just noticed that open_dataset has an autoclose option, which is off by default. So we could probably use that instead in the future.

spencerkclark · 2017-11-16T00:53:20Z

This was on my Mac.

Just to confirm, so you got the tests to fail on your Mac using netCDF4 version 1.3.1? I was not able to do so.

spencerahill · 2017-11-16T01:28:18Z

Huh, that's odd. Yes, 1.3.1:

(py36) [develop]shill@shillbook:~/Dropbox/py/aospy/aospy/test$ conda upgrade -c conda-forge netcdf4
Fetching package metadata ...........
Solving package specifications: .

Package plan for installation in environment /Users/shill/Dropbox/miniconda3/envs/py36:

The following packages will be UPDATED:

    h5py:      2.7.1-py36_1 conda-forge --> 2.7.1-py36_2 conda-forge
    hdf5:      1.8.18-1     conda-forge --> 1.10.1-1     conda-forge
    libnetcdf: 4.4.1.1-6    conda-forge --> 4.5.0-3      conda-forge
    libpng:    1.6.28-1     conda-forge --> 1.6.28-2     conda-forge
    libtiff:   4.0.7-0      conda-forge --> 4.0.7-1      conda-forge
    netcdf4:   1.3.0-py36_0 conda-forge --> 1.3.1-py36_2 conda-forge
    pillow:    4.3.0-py36_0 conda-forge --> 4.3.0-py36_1 conda-forge
    python:    3.6.3-0      conda-forge --> 3.6.3-1      conda-forge
    zlib:      1.2.8-3                  --> 1.2.11-0     conda-forge

Proceed ([y]/n)? y

zlib-1.2.11-0. 100% |##############################################################################################################################| Time: 0:00:01  96.84 kB/s
hdf5-1.10.1-1. 100% |##############################################################################################################################| Time: 0:00:20 262.05 kB/s
libpng-1.6.28- 100% |##############################################################################################################################| Time: 0:00:02 138.73 kB/s
libtiff-4.0.7- 100% |##############################################################################################################################| Time: 0:00:02 224.10 kB/s
python-3.6.3-1 100% |##############################################################################################################################| Time: 0:00:29 425.46 kB/s
libnetcdf-4.5. 100% |##############################################################################################################################| Time: 0:00:06 295.91 kB/s
pillow-4.3.0-p 100% |##############################################################################################################################| Time: 0:00:00 647.48 kB/s
h5py-2.7.1-py3 100% |##############################################################################################################################| Time: 0:00:02 359.99 kB/s
netcdf4-1.3.1- 100% |##############################################################################################################################| Time: 0:00:02 374.42 kB/s

(py36) [develop]shill@shillbook:~/Dropbox/py/aospy/aospy/test$ which pytest
/Users/shill/Dropbox/miniconda3/envs/py36/bin/pytest

py36) [develop]shill@shillbook:~/Dropbox/py/aospy$ git checkout -b micahkim23-attrs develop
Switched to a new branch 'micahkim23-attrs'
(py36) [micahkim23-attrs]shill@shillbook:~/Dropbox/py/aospy$ git pull https://github.com/micahkim23/aospy.git attrs
remote: Counting objects: 30, done.
remote: Compressing objects: 100% (16/16), done.
remote: Total 30 (delta 19), reused 25 (delta 14), pack-reused 0
Unpacking objects: 100% (30/30), done.
From https://github.com/micahkim23/aospy
 * branch            attrs      -> FETCH_HEAD
Updating 2ef0b9a..4fabe9f
Fast-forward
 aospy/calc.py                 | 29 ++++++++++++++++++++++++++---
 aospy/test/test_calc_basic.py | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------
 docs/whats-new.rst            |  3 +++
 3 files changed, 93 insertions(+), 23 deletions(-)

(py36) [micahkim23-attrs]shill@shillbook:~/Dropbox/py/aospy/ci$ cd aospy/test/
(py36) [micahkim23-attrs]shill@shillbook:~/Dropbox/py/aospy/aospy/test$ which pytest
/Users/shill/Dropbox/miniconda3/envs/py36/bin/pytest
(py36) [micahkim23-attrs]shill@shillbook:~/Dropbox/py/aospy/aospy/test$ pytest test_calc_basic.py
============================================================================ test session starts =============================================================================
platform darwin -- Python 3.6.3, pytest-3.2.3, py-1.5.1, pluggy-0.4.0
rootdir: /Users/shill/Dropbox/py/aospy, inifile: setup.cfg
collected 39 items

test_calc_basic.py FFFFFFFFFFFFFFFFFFFFFFFFFFF............

================================================================================== FAILURES ==================================================================================
_______________________________________________________________________ TestCalcBasic.test_annual_mean _______________________________________________________________________

self = <aospy.test.test_calc_basic.TestCalcBasic testMethod=test_annual_mean>

    def test_annual_mean(self):
        calc_int = CalcInterface(intvl_out='ann',
                                 dtype_out_time='av',
                                 **self.test_params)
        calc = Calc(calc_int)
        calc.compute()
>       _test_files_and_attrs(calc, 'av')

test_calc_basic.py:63:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test_calc_basic.py:37: in _test_files_and_attrs
    _test_output_attrs(calc, dtype_out)
test_calc_basic.py:19: in _test_output_attrs
    with xr.open_dataset(calc.path_out[dtype_out]) as data:
../../../../miniconda3/envs/py36/lib/python3.6/site-packages/xarray/backends/api.py:282: in open_dataset
    autoclose=autoclose)
../../../../miniconda3/envs/py36/lib/python3.6/site-packages/xarray/backends/netCDF4_.py:210: in __init__
    self.ds = opener()
../../../../miniconda3/envs/py36/lib/python3.6/site-packages/xarray/backends/netCDF4_.py:185: in _open_netcdf4_group
    ds = nc4.Dataset(filename, mode=mode, **kwargs)
netCDF4/_netCDF4.pyx:2015: in netCDF4._netCDF4.Dataset.__init__
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   OSError: [Errno -36] NetCDF: Invalid argument: b'/Users/shill/Dropbox/py/aospy/aospy/test/data/objects/test-files/example_proj/example_model/example_run/condensation_rain/condensation_rain.ann.av.from_monthly_ts.example_model.example_run.0004-0006.nc'

netCDF4/_netCDF4.pyx:1636: OSError

...

spencerkclark · 2017-11-16T02:20:17Z

@spencerahill many thanks for that detail! Based on my experience, I think now it's actually libnetcdf that's the problem. I was still on libnetcdf version 4.4.1.1 (because of some other package requirements in my environment). When I removed those requirements (and subsequently was able to upgrade libnetcdf to 4.5.0) I was able to reproduce the errors.

It occurred to me that in lieu of finding the root of the issue, we could (just for these tests) open the files using engine='scipy' in our call to xr.open_dataset; this seems to fix the tests in my environment.

spencerkclark · 2017-11-16T02:58:29Z

In doing a little more digging, I don't think it has anything to do with the filenames. I have issues opening the files the test suite creates using ncdump in the command line (I get the same error):

$ ncdump -h condensation_rain/condensation_rain.ann.reg.av.from_monthly_ts.example_model.example_run.0004-0006.nc
ncdump: condensation_rain.ann.reg.av.from_monthly_ts.example_model.example_run.0004-0006.nc: NetCDF: Invalid argument

With libnetcdf version 4.4.1.1 things work fine:

ncdump -h condensation_rain.ann.reg.av.from_monthly_ts.example_model.example_run.0004-0006.nc
netcdf condensation_rain.ann.reg.av.from_monthly_ts.example_model.example_run.0004-0006 {
variables:
	double raw_data_start_date ;
		raw_data_start_date:units = "days since 1675-01-01" ;
		raw_data_start_date:long_name = "time axis boundaries" ;
		raw_data_start_date:calendar = "NOLEAP" ;
		raw_data_start_date:_FillValue = NaN ;
	int subset_end_date ;
		subset_end_date:units = "days since 1680-12-31 00:00:00" ;
		subset_end_date:calendar = "proleptic_gregorian" ;
	double raw_data_end_date ;
		raw_data_end_date:units = "days since 1675-01-01" ;
		raw_data_end_date:long_name = "time axis boundaries" ;
		raw_data_end_date:calendar = "NOLEAP" ;
		raw_data_end_date:_FillValue = NaN ;
	double sahel ;
		sahel:units = "" ;
		sahel:_FillValue = NaN ;
		sahel:description = "condensation rain" ;
	int subset_start_date ;
		subset_start_date:units = "days since 1678-01-01 00:00:00" ;
		subset_start_date:calendar = "proleptic_gregorian" ;

// global attributes:
		:coordinates = "raw_data_start_date subset_end_date raw_data_end_date subset_start_date" ;
}

spencerkclark · 2017-11-16T03:18:05Z

Given their dependency setup, it appears that xarray's CI has not been hit by libnetcdf version 4.5.0 yet (this was inferred by glancing through their build logs). Sure enough, when I run their test suite under the latest libnetcdf I get some failures (so I don't think we are the only ones):

__________ GenericNetCDFDataTest.test_cross_engine_read_write_netcdf3 __________

self = <xarray.tests.test_backends.GenericNetCDFDataTest testMethod=test_cross_engine_read_write_netcdf3>

    def test_cross_engine_read_write_netcdf3(self):
        data = create_test_data()
        valid_engines = set()
        if has_netCDF4:
            valid_engines.add('netcdf4')
        if has_scipy:
            valid_engines.add('scipy')

        for write_engine in valid_engines:
            for format in ['NETCDF3_CLASSIC', 'NETCDF3_64BIT']:
                with create_tmp_file() as tmp_file:
                    data.to_netcdf(tmp_file, format=format,
                                   engine=write_engine)
                    for read_engine in valid_engines:
                        with open_dataset(tmp_file,
>                                         engine=read_engine) as actual:

tests/test_backends.py:1016:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
backends/api.py:282: in open_dataset
    autoclose=autoclose)
backends/netCDF4_.py:210: in __init__
    self.ds = opener()
backends/netCDF4_.py:185: in _open_netcdf4_group
    ds = nc4.Dataset(filename, mode=mode, **kwargs)
netCDF4/_netCDF4.pyx:2015: in netCDF4._netCDF4.Dataset.__init__
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   IOError: [Errno -36] NetCDF: Invalid argument: '/var/folders/qn/0m4n89p48xl7t6006s6fdp6h0000gn/T/tmpPjiV4A/temp-91.nc'

netCDF4/_netCDF4.pyx:1636: IOError
___ GenericNetCDFDataTestAutocloseTrue.test_cross_engine_read_write_netcdf3 ____

self = <xarray.tests.test_backends.GenericNetCDFDataTestAutocloseTrue testMethod=test_cross_engine_read_write_netcdf3>

    def test_cross_engine_read_write_netcdf3(self):
        data = create_test_data()
        valid_engines = set()
        if has_netCDF4:
            valid_engines.add('netcdf4')
        if has_scipy:
            valid_engines.add('scipy')

        for write_engine in valid_engines:
            for format in ['NETCDF3_CLASSIC', 'NETCDF3_64BIT']:
                with create_tmp_file() as tmp_file:
                    data.to_netcdf(tmp_file, format=format,
                                   engine=write_engine)
                    for read_engine in valid_engines:
                        with open_dataset(tmp_file,
>                                         engine=read_engine) as actual:

tests/test_backends.py:1016:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
backends/api.py:282: in open_dataset
    autoclose=autoclose)
backends/netCDF4_.py:210: in __init__
    self.ds = opener()
backends/netCDF4_.py:185: in _open_netcdf4_group
    ds = nc4.Dataset(filename, mode=mode, **kwargs)
netCDF4/_netCDF4.pyx:2015: in netCDF4._netCDF4.Dataset.__init__
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   IOError: [Errno -36] NetCDF: Invalid argument: '/var/folders/qn/0m4n89p48xl7t6006s6fdp6h0000gn/T/tmpCAfrfg/temp-96.nc'

netCDF4/_netCDF4.pyx:1636: IOError
== 2 failed, 1904 passed, 536 skipped, 6 xfailed, 15 xpassed in 94.18 seconds ==

spencerahill · 2017-11-16T04:27:44Z

@spencerkclark nice work! Yes, I'd say you've pinpointed libnetcdf as the problem. I will pass word to xarray and netCDF4.

In terms of the CI, do you think we should pin the libnetcdf version, or live with the test failures for now, or ...?

spencerkclark · 2017-11-16T12:25:39Z

I will pass word to xarray and netCDF4.

Thanks for doing that; we'll see what they suspect the problem might be.

In terms of the CI, do you think we should pin the libnetcdf version, or live with the test failures for now, or ...?

Hopefully the issues you've posted will spark discussion upstream, so I'd be fine pinning libnetcdf to version 4.4 so our test suite passes. We can unpin it once those issues are diagnosed/resolved.

spencerahill · 2017-11-16T15:20:36Z

I'd be fine pinning libnetcdf to version 4.4 so our test suite passes. We can unpin it once those issues are diagnosed/resolved.

Sounds good to me. So @micahkim23, please walk back our pinning of netCDF4 and instead pin libnetcdf, in all our test environments.

spencerkclark · 2017-11-17T23:11:06Z

@spencerahill @micahkim23 based on discussion upstream, I think a (perhaps better?) alternative fix would be to save all our files using the 'netcdf4' engine going forward (rather than the 'scipy' engine). That seems to be where the issues stem from.

Due to #69 (i.e. the ncview tool installed on many clusters is old and doesn't support newer netCDF formats), we would probably want to specify one of the older formats (e.g. format='NETCDF3_64BIT' when saving as well), but that's not a big deal (because that's what the 'scipy' engine was doing anyway).

So the suggestion is to change:

aospy/aospy/calc.py

Line 612 in 2ef0b9a

data_out.to_netcdf(path, engine='scipy')

To:

data_out.to_netcdf(path, engine='netcdf4', format='NETCDF3_64BIT')

Maybe we should have been doing that all along...

spencerahill · 2017-11-17T23:34:54Z

@spencerkclark thanks for engaging with the netcdf folks on this. I agree with your suggestion. Much better than pinning the libnetcdf version for our tests.

Once we do this, that means scipy is no longer a dependency. So when we implement this, we should also remove scipy from all our environments and setup.py.

I'll go ahead and submit a PR on this shortly

spencerahill · 2017-11-17T23:35:47Z

@micahkim23 this means that you can now disregard my previous about pinning libnetcdf (or netcdf...no pinning required)

micahkim23 · 2017-11-20T08:09:43Z

in examples.rst there was a typo, so the example wasn't rendering correctly in the docs. calcs[0].data_out should display an example that includes units and description in the DataArray object.

spencerahill · 2017-11-20T16:55:04Z

in examples.rst there was a typo, so the example wasn't rendering correctly in the docs.

Woops! Nice catch.

One final request on this: can you add a sentence immediately after that ipython block in examples.rst making note of this metadata. I.e. something like "Notice that the variable's name and description have been copied to the resulting Dataset (and hence also to the netCDF file saved to disk. This enables you to better understand what the physical quantity is, even if you don't have the original Var definition on hand."

@spencerkclark, if you wouldn't mind doing one last review also, then I'll merge.

spencerkclark

@micahkim23 one final minor thing and then I think we're good to merge! Thanks for your patience as we sorted out what was going on with the CI.

spencerkclark · 2017-11-20T18:36:30Z

docs/examples.rst

+    Notice that the variable's name and description have been copied
+    to the resulting Dataset (and hence also to the netCDF file saved
+    to disk). This enables you to better understand what the physical
+    quantity is, even if you don't have the original Var definition


Make Var a literal here (i.e. surround it by double back ticks).

spencerkclark · 2017-11-20T19:36:57Z

docs/examples.rst

@@ -404,7 +404,7 @@ and the results of each output type
    Notice that the variable's name and description have been copied
    to the resulting Dataset (and hence also to the netCDF file saved
    to disk). This enables you to better understand what the physical
-    quantity is, even if you don't have the original Var definition
+    quantity is, even if you don't have the original `Var` definition


Double back ticks :)

``Var``

spencerkclark · 2017-11-20T19:43:15Z

@spencerahill feel free to merge when you're ready!

spencerahill · 2017-11-20T19:50:07Z

In it goes! Thanks much @micahkim23

micahkim23 force-pushed the attrs branch from 6fab9bb to 10ee1a8 Compare November 12, 2017 02:59

add units and description to output netcdf files

bf26fe3

micahkim23 force-pushed the attrs branch from 10ee1a8 to bf26fe3 Compare November 12, 2017 03:01

spencerkclark reviewed Nov 12, 2017

View reviewed changes

spencerahill requested changes Nov 13, 2017

View reviewed changes

micahkim23 added 2 commits November 12, 2017 23:57

code review fixes

5975d71

fixed logic, added some tests

cba415b

spencerkclark reviewed Nov 13, 2017

View reviewed changes

finished up adding tests

24b9eb0

spencerahill requested changes Nov 14, 2017

View reviewed changes

spencerkclark reviewed Nov 14, 2017

View reviewed changes

micahkim23 added 2 commits November 14, 2017 20:52

use pytest, fix issues from code review

47b13ab

Merge branch 'develop' into attrs

4fabe9f

spencerahill mentioned this pull request Nov 16, 2017

Invalid argument errors with libnetcdf 4.5.0 but not 4.4.1 Unidata/netcdf4-python#742

Open

spencerahill mentioned this pull request Nov 17, 2017

Use netcdf rather than scipy as to_netcdf engine #235

Merged

typo in python

b1e68c8

added some docs

071d761

spencerkclark reviewed Nov 20, 2017

View reviewed changes

spencerkclark changed the title ~~WIP add units and description to output netcdf files~~ Add units and description to output netcdf files Nov 20, 2017

make Var a literal

cf94673

spencerkclark reviewed Nov 20, 2017

View reviewed changes

double backticks

a4a3a4b

spencerahill approved these changes Nov 20, 2017

View reviewed changes

spencerahill merged commit 82af0ba into spencerahill:develop Nov 20, 2017

	@pytest.mark.parametrize(
	('type_', 'expected'),
	[(Var, [condensation_rain, convection_rain, precip, ps, sphum]),
	(Proj, [example_proj])])
	def test_get_all_objs_of_type(obj_lib, type_, expected):
	actual = _get_all_objs_of_type(type_, obj_lib)
	assert set(expected) == set(actual)

Add units and description to output netcdf files #232

Add units and description to output netcdf files #232

Conversation

micahkim23 commented Nov 12, 2017

spencerkclark left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

micahkim23 commented Nov 12, 2017

spencerkclark commented Nov 12, 2017

spencerahill left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spencerahill commented Nov 13, 2017

micahkim23 commented Nov 13, 2017

spencerkclark commented Nov 13, 2017 • edited Loading

spencerkclark Nov 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spencerahill left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spencerahill commented Nov 14, 2017

spencerkclark left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

micahkim23 commented Nov 15, 2017

spencerkclark commented Nov 15, 2017

spencerkclark commented Nov 15, 2017

spencerahill commented Nov 15, 2017 • edited Loading

spencerahill commented Nov 15, 2017

spencerahill commented Nov 15, 2017

spencerahill commented Nov 15, 2017

spencerkclark commented Nov 15, 2017

spencerahill commented Nov 16, 2017

spencerahill commented Nov 16, 2017

spencerkclark commented Nov 16, 2017

spencerahill commented Nov 16, 2017

spencerkclark commented Nov 16, 2017

spencerkclark commented Nov 16, 2017

spencerkclark commented Nov 16, 2017

spencerahill commented Nov 16, 2017

spencerkclark commented Nov 16, 2017

spencerahill commented Nov 16, 2017

spencerkclark commented Nov 17, 2017

spencerahill commented Nov 17, 2017

spencerahill commented Nov 17, 2017 • edited Loading

micahkim23 commented Nov 20, 2017 • edited Loading

spencerahill commented Nov 20, 2017

spencerkclark left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spencerkclark commented Nov 20, 2017

spencerahill commented Nov 20, 2017

spencerkclark commented Nov 13, 2017 •

edited

Loading

spencerkclark Nov 13, 2017 •

edited

Loading

spencerahill commented Nov 15, 2017 •

edited

Loading

spencerahill commented Nov 17, 2017 •

edited

Loading

micahkim23 commented Nov 20, 2017 •

edited

Loading