Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add units and description to output netcdf files #232

Merged
merged 10 commits into from
Nov 20, 2017

Conversation

micahkim23
Copy link
Collaborator

Closes #201

Copy link
Collaborator

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting on this @micahkim23. I have a few initial inline style comments as well as one general question to discuss with @spencerahill.

I think it might make things easier to test if you made these methods independent of the Calc class (i.e. methods that didn't take self as the first argument), passing units, description, and dtype_out_vert as additional arguments:

def add_units_and_description_attrs(data, units, description, dtype_out_vert):
    ...

@spencerahill perhaps we could consider moving these methods to the utils/io.py module? Or maybe create a new utils/metadata.py module? They seem general and simple enough that they don't need to be part of Calc.

aospy/calc.py Outdated
@@ -767,3 +767,19 @@ def load(self, dtype_out_time, dtype_out_vert=False, region=False,
if plot_units:
data = self.var.to_plot_units(data, dtype_vert=dtype_out_vert)
return data

def addAttrs(self, data):
if (isinstance(data, xr.DataArray)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the outer set of parentheses (e.g. only isinstance(data, xr.DataArray) is needed).

aospy/calc.py Outdated
for name, da in data.data_vars.items():
self.addAttrsDataArray(da)

def addAttrsDataArray(self, data):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the above comment regarding function naming, I recommend calling this one add_units_and_description_attrs_da.

aospy/calc.py Outdated
else:
data.attrs['units'] = self.var.units
if self.var.description != '':
data.attrs['description'] = self.var.description
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: make sure to add a newline at the end of the file (that's what the red "no" symbol at the end of this line on GitHub signifies).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually inclined to go ahead and write the description to the file even if it's blank. That makes the logic simpler and also makes the netCDF files outputted by aospy more homogeneous. I.e. once this is merged, users can always rely on there being a 'description' attr.

aospy/calc.py Outdated
def addAttrsDataArray(self, data):
if self.var.units != '':
if self.var.dtype_out_vert == 'vert_int':
data.attrs['units'] = '(vertical integral of {0}): {0} kg m^-2)'.format(self.var.units)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line is longer than 79 characters; one way to break it up would be to define the string first in a separate variable, then format and assign to data.attrs['units'] in a separate line:

units = '(Vertical integral of {0}): {0} kg m^-2'
data.attrs['units'] = units.format(self.var.units)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my comment below re: always adding a description even if empty, I think we should do the same thing for units.

One question, though, is what units to use for vertical integrals when the original units are empty. As written, we don't include them. My initial thought is to, in that case, do e.g. (Vertical integral of quantity with unspecified units)

aospy/calc.py Outdated
@@ -767,3 +767,19 @@ def load(self, dtype_out_time, dtype_out_vert=False, region=False,
if plot_units:
data = self.var.to_plot_units(data, dtype_vert=dtype_out_vert)
return data

def addAttrs(self, data):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that PEP8 suggests using function names that are all lowercase, with words separated by underscores. For this function I would recommend using something more descriptive like add_units_and_description_attrs.

@micahkim23
Copy link
Collaborator Author

In terms of testing, should I test the individual functions themselves and also test that the output netcdf files are getting the attrs added?

@spencerkclark
Copy link
Collaborator

@micahkim23 exactly, I would do both. If you go the route of writing the functions to be independent of Calc, test the functions themselves thoroughly (i.e. go through all the options of dtype_out_vert etc.), and then just add a check within one of the existing tests of Calc computations (you don't need to do it for all of them) where you open a file that is created and check to make sure the appropriate attributes are added.

By existing tests of Calc I'm referring to one like this one:

def test_annual_mean(self):
calc_int = CalcInterface(intvl_out='ann',
dtype_out_time='av',
**self.test_params)
calc = Calc(calc_int)
calc.compute()
assert isfile(calc.path_out['av'])
assert isfile(calc.path_tar_out)

After the assert isfile statements I would open up the calc.path_out['av'] file and check the attributes were added to it.

It might also be good to do the same thing for a regional calculation like the following one (since the logic for saving those is slightly different):

def test_simple_reg_av(self):
calc_int = CalcInterface(intvl_out='ann',
dtype_out_time='reg.av',
region=[globe],
**self.test_params)
calc = Calc(calc_int)
calc.compute()
assert isfile(calc.path_out['reg.av'])
assert isfile(calc.path_tar_out)

Copy link
Owner

@spencerahill spencerahill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a few more things in addition to @spencerkclark's, all of which I agree with.

Also, I'd like to highlight this functionality in our docs beyond a What's New entry. Specifically, let's modify our Examples page to demonstrate it, e.g. in this section.

aospy/calc.py Outdated
@@ -601,8 +602,7 @@ def _save_files(self, data, dtype_out_time):
reg_data = xr.open_dataset(path)
except (EOFError, RuntimeError, IOError):
reg_data = xr.Dataset()
# Add the new data to the dictionary or Dataset.
# Same method works for both.
# Add the new data to the Dataset.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this logic is now self-explanatory, so I'd just omit the comment altogether.

aospy/calc.py Outdated
else:
data.attrs['units'] = self.var.units
if self.var.description != '':
data.attrs['description'] = self.var.description
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually inclined to go ahead and write the description to the file even if it's blank. That makes the logic simpler and also makes the netCDF files outputted by aospy more homogeneous. I.e. once this is merged, users can always rely on there being a 'description' attr.

aospy/calc.py Outdated
def addAttrsDataArray(self, data):
if self.var.units != '':
if self.var.dtype_out_vert == 'vert_int':
data.attrs['units'] = '(vertical integral of {0}): {0} kg m^-2)'.format(self.var.units)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my comment below re: always adding a description even if empty, I think we should do the same thing for units.

One question, though, is what units to use for vertical integrals when the original units are empty. As written, we don't include them. My initial thought is to, in that case, do e.g. (Vertical integral of quantity with unspecified units)

@spencerahill
Copy link
Owner

@spencerahill perhaps we could consider moving these methods to the utils/io.py module? Or maybe create a new utils/metadata.py module? They seem general and simple enough that they don't need to be part of Calc.

I agree that they shouldn't be Calc methods. But I don't think they make sense to be public API. For that reason, I would just define them as module-level functions within calc.py, all named with leading underscores to indicate that they should be considered private API. But I could be convinced otherwise.

@micahkim23
Copy link
Collaborator Author

I still have to finish testing the functions and editing the Examples page.

@spencerkclark
Copy link
Collaborator

spencerkclark commented Nov 13, 2017

I agree that they shouldn't be Calc methods. But I don't think they make sense to be public API. For that reason, I would just define them as module-level functions within calc.py, all named with leading underscores to indicate that they should be considered private API.

My thinking was just that calc.py is pretty cluttered as it is, so it might be worth splitting these private API functions off somewhere else, but I'm totally fine going this route. We can refactor things later if we feel inclined.

For that reason, I would just define them as module-level functions within calc.py, all named with leading underscores to indicate that they should be considered private API.

@micahkim23 this still means that these methods should be defined outside of the Calc class. You'll see that this makes them easier to test. Creating test Calc objects requires a lot of boilerplate code. Here you'll just need to make some dummy DataArrays and Datasets.

@@ -37,6 +39,10 @@ def test_annual_mean(self):
calc.compute()
assert isfile(calc.path_out['av'])
assert isfile(calc.path_tar_out)
data = xr.open_mfdataset(calc.path_out['av'], decode_times=False)
for name, da in data.data_vars.items():
assert 'units' in da.attrs
Copy link
Collaborator

@spencerkclark spencerkclark Nov 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of testing just that the attribute was added, I recommend testing that the value of the attribute is what you expect. This will test both that the attribute is in the dictionary, and make sure the proper value was encoded simultaneously.

I think you can pick off the expected values of the units and description attributes from the Var object associated with the test:

expected_units = calc.var.units
expected_description = calc.var.description

@@ -37,6 +39,10 @@ def test_annual_mean(self):
calc.compute()
assert isfile(calc.path_out['av'])
assert isfile(calc.path_tar_out)
data = xr.open_mfdataset(calc.path_out['av'], decode_times=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think xr.open_dataset should be all that is needed here. I suspect decode_times=False is also not needed as well.

Copy link
Owner

@spencerahill spencerahill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@micahkim23 we are almost there! This should be my last round of review.

A couple cleanup things, and also your new tests sparked me to come up with a way to clean-up all of the tests in TestCalcBasic.

aospy/calc.py Outdated
@@ -767,3 +768,23 @@ def load(self, dtype_out_time, dtype_out_vert=False, region=False,
if plot_units:
data = self.var.to_plot_units(data, dtype_vert=dtype_out_vert)
return data

def _add_units_and_description_attrs(data, units, description, dtype_out_vert):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I see this, the function name seems clunky. How about _add_metadata_as_attrs` instead? That is less of a mouthful and also leaves the door open to us adding additional attrs within the function in the future, without renaming.

aospy/calc.py Outdated
dtype_out_vert)

def _add_units_and_description_attrs_da(data, units, description,
dtype_out_vert):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these helper functions should have a one-line docstring. (Since they're private API and pretty self explanatory, a full one listing parameters, return value, etc., isn't necessary.)

aospy/calc.py Outdated

def _add_units_and_description_attrs_da(data, units, description,
dtype_out_vert):
units = units
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line doesn't do anything; it's assigning something to itself. So it can be removed.

@@ -143,5 +146,55 @@ def setUp(self):
'dtype_out_vert': 'vert_int'
}

class TestCalcAttrs(unittest.TestCase):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I didn't mention this before, but we are transitioning away from the class-based, unittest.Testcase style of tests in favor of py.test's more function-based tests. The latter requires less boilerplate and provides additional functionality.

So all you have to do here is delete this TestCalcAttrs class, and turn it's test_units_description_attrs method into a module-level function.

When you do that, I think it will flow better if you place it after your helper functions, rather than before.

_add_units_and_description_attrs(ds, units, description, dtype_out_vert)
assert expected_units == da.attrs['units']
assert expected_description == da.attrs['description']
for name, d_arr in ds.data_vars.items():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention is to use the name arr for xr.DataArrays.

assert expected_units == d_arr.attrs['units']
assert expected_description == d_arr.attrs['description']

def _test_output_attrs(calc, file):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above comment, I would move this to above the TestCalcBasic definition, since it's used within that class.

@@ -37,6 +39,7 @@ def test_annual_mean(self):
calc.compute()
assert isfile(calc.path_out['av'])
assert isfile(calc.path_tar_out)
_test_output_attrs(calc, 'av')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to add this test to all of the cases. In fact, we should bundle this along with the two assert statements that are in all of these. E.g. define at the module level

def _test_files_and_attrs(calc, dtype_out):
    assert isfile(calc.path_out[dtype_out])
    assert isfile(calc.path_tar_out)
    _test_output_attrs(calc, 'av')

And then replace the two assert statements (and `_test_output_attrs` where you added it) in each of these tests with this single function call.

@spencerahill
Copy link
Owner

Woops, two more things:

  1. If you didn't already, please merge the latest commits from develop (namely Fix AppVeyor 3.4 failing builds #233) so that we don't get the Appveyor 3.4 error.
  2. I just noticed all the other test failures (both Travis and Appveyor); they suggest there's something awry with your tests. Let us know if you get stuck on that. I suspect those will show up on your local machine too, not just in CI, so that should enable you to debug quickly.

Copy link
Collaborator

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @micahkim23 -- just a couple more things from me as well!

@@ -143,5 +146,55 @@ def setUp(self):
'dtype_out_vert': 'vert_int'
}

class TestCalcAttrs(unittest.TestCase):
def test_units_description_attrs(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building off of @spencerahill's comment regarding using pytest, this is a classic use case for pytest.mark.parametrize, so let's make sure to use that too. See here for an example use in our code-base:

@pytest.mark.parametrize(
('type_', 'expected'),
[(Var, [condensation_rain, convection_rain, precip, ps, sphum]),
(Proj, [example_proj])])
def test_get_all_objs_of_type(obj_lib, type_, expected):
actual = _get_all_objs_of_type(type_, obj_lib)
assert set(expected) == set(actual)

The benefit of using pytest.mark.parametrize is that it splits up this big group of tests (which if one fails the entire test function fails, and the remaining cases don't get run) into individual tests, each of which gets run separately. This gives us a more complete picture for debugging if something goes wrong.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's cool. I'll use that.

aospy/calc.py Outdated
@@ -587,6 +587,9 @@ def compute(self, write_to_tar=True):
reduced = self._apply_all_time_reductions(full, monthly, eddy)
logging.info("Writing desired gridded outputs to disk.")
for dtype_time, data in reduced.items():
_add_units_and_description_attrs(data, self.var.units,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's just me, but for some reason the fact that this function doesn't return anything (and just mutates the input xarray object) feels odd. It would feel more readable to me if this line were something along the lines of:

data = _add_units_and_description_attrs(data, self.var.units, self.var.description, self.dtype_out_vert)

So I might consider re-writing the functions such that this would be possible (i.e. add return statements where appropriate).

@micahkim23
Copy link
Collaborator Author

weird, for the travis build, the error is:

IOError: [Errno -36] NetCDF: Invalid argument: '/home/travis/build/spencerahill/aospy/aospy/test/data/objects/test-files/example_proj/example_model/example_run/sphum/sphum.ann.reg.ts.vert_int.from_monthly_ts_sigma.example_model.example_run.0006.nc'

It has problems with xr.open_dataset

@spencerkclark
Copy link
Collaborator

Indeed this is a puzzling issue. Things seem to work fine in the Python 3.4 environment, which uses an earlier version of netCDF4 (1.2.7, as opposed to the most current 1.3.1, which is used in the other environments).

@micahkim23 which version of netCDF4 do you have installed locally on your computer? If it is not 1.3.1, could you try upgrading to the latest version to see if you can reproduce the issues on Travis on your own machine?

$ conda upgrade -c conda-forge netcdf4

@spencerkclark
Copy link
Collaborator

The Windows test failures for Python 3.4 on AppVeyor seem different:

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\projects\\aospy\\aospy\\test\\data\\objects\\test-files\\example_proj\\example_model\\example_run\\condensation_rain\\condensation_rain.ann.av.from_monthly_ts.example_model.example_run.0004-0006.nc'

To address those I might try closing the Dataset opened in _test_output_attrs at the end of the method:

data.close()

@spencerahill
Copy link
Owner

spencerahill commented Nov 15, 2017

To address those I might try closing the Dataset opened in _test_output_attrs at the end of the method

Good call. Even better and more idiomatic would be to use a context manager, i.e. a with statement:

with xr.open_dataset(calc.path_out[dtype_out]) as data:
    ...

The file is then automatically closed once the context manager block is finished, even in the case of crashes.

@spencerahill
Copy link
Owner

@micahkim23 FYI I swear there aren't usually this many annoying problems with the CI! Usually they just work.

@spencerahill
Copy link
Owner

Micah and I confirmed offline that it is indeed the netCDF 1.3.1 version that is causing the problem; pinning to 1.2.7 worked (at least in the 3.6 environment).

So a viable solution that I can live with is to pin it in all of the environments. @spencerkclark does that sound alright to you?

@spencerahill
Copy link
Owner

We also confirmed that the with statement version resolved the Appveyor fix (see https://ci.appveyor.com/project/spencerahill/aospy/build/1.0.384).

@spencerkclark
Copy link
Collaborator

Micah and I confirmed offline that it is indeed the netCDF 1.3.1 version that is causing the problem; pinning to 1.2.7 worked (at least in the 3.6 environment).

Hmm...I tried reproducing the issue locally, but I wasn't able to. What platform are you trying this on? I've tried on both my Mac and GFDL's linux systems.

Locally, is there any kind of debugging you can do to determine the root of the issue? In looking at prior issues citing this error message (e.g. Unidata/netcdf4-python#143) it seems a common thread is how the filepath is encoded. There was a recent PR in netcdf4-python that touched this part of their code base Unidata/netcdf4-python#693 (and as far as I can tell, version 1.3.1 is the only version on conda-forge that contains this change). Maybe what we're seeing here is related to those changes?

(I guess I'm a little hesitant to pin netCDF4 in our test environments across the board, because it is a pretty fundamental library to aospy).

@spencerahill
Copy link
Owner

What platform are you trying this on?

This was on my Mac. Note that the errors are occurring only on the new tests by @micahkim23. See this branch: https://github.com/spencerahill/aospy/tree/micahkim23-attrs, and in particular its Travis log. We pinned netcdf to 1.2.7 only in the 3.6 environment, and the tests pass there but not in 3.5 where it's still 1.3.1.

a common thread is how the filepath is encoded.

Good point. Also, notice the 'b' prefix to the strings in all of these failures, which indicates a bytes literal:

b'/home/travis/build/spencerahill/aospy/aospy/test/data/objects/test-files/example_proj/example_model/example_run/condensation_rain/condensation_rain.ann.ts.from_monthly_ts.example_model.example_run.0004-0006.nc'

Is that related?

I guess I'm a little hesitant to pin netCDF4 in our test environments across the board, because it is a pretty fundamental library to aospy

This is a totally valid point. Worth digging into a little more in order to avoid pinning if possible.

@spencerahill
Copy link
Owner

FYI re: the with statement, I just noticed that open_dataset has an autoclose option, which is off by default. So we could probably use that instead in the future.

@spencerkclark
Copy link
Collaborator

This was on my Mac.

Just to confirm, so you got the tests to fail on your Mac using netCDF4 version 1.3.1? I was not able to do so.

@spencerahill
Copy link
Owner

Huh, that's odd. Yes, 1.3.1:

(py36) [develop]shill@shillbook:~/Dropbox/py/aospy/aospy/test$ conda upgrade -c conda-forge netcdf4
Fetching package metadata ...........
Solving package specifications: .

Package plan for installation in environment /Users/shill/Dropbox/miniconda3/envs/py36:

The following packages will be UPDATED:

    h5py:      2.7.1-py36_1 conda-forge --> 2.7.1-py36_2 conda-forge
    hdf5:      1.8.18-1     conda-forge --> 1.10.1-1     conda-forge
    libnetcdf: 4.4.1.1-6    conda-forge --> 4.5.0-3      conda-forge
    libpng:    1.6.28-1     conda-forge --> 1.6.28-2     conda-forge
    libtiff:   4.0.7-0      conda-forge --> 4.0.7-1      conda-forge
    netcdf4:   1.3.0-py36_0 conda-forge --> 1.3.1-py36_2 conda-forge
    pillow:    4.3.0-py36_0 conda-forge --> 4.3.0-py36_1 conda-forge
    python:    3.6.3-0      conda-forge --> 3.6.3-1      conda-forge
    zlib:      1.2.8-3                  --> 1.2.11-0     conda-forge

Proceed ([y]/n)? y

zlib-1.2.11-0. 100% |##############################################################################################################################| Time: 0:00:01  96.84 kB/s
hdf5-1.10.1-1. 100% |##############################################################################################################################| Time: 0:00:20 262.05 kB/s
libpng-1.6.28- 100% |##############################################################################################################################| Time: 0:00:02 138.73 kB/s
libtiff-4.0.7- 100% |##############################################################################################################################| Time: 0:00:02 224.10 kB/s
python-3.6.3-1 100% |##############################################################################################################################| Time: 0:00:29 425.46 kB/s
libnetcdf-4.5. 100% |##############################################################################################################################| Time: 0:00:06 295.91 kB/s
pillow-4.3.0-p 100% |##############################################################################################################################| Time: 0:00:00 647.48 kB/s
h5py-2.7.1-py3 100% |##############################################################################################################################| Time: 0:00:02 359.99 kB/s
netcdf4-1.3.1- 100% |##############################################################################################################################| Time: 0:00:02 374.42 kB/s

(py36) [develop]shill@shillbook:~/Dropbox/py/aospy/aospy/test$ which pytest
/Users/shill/Dropbox/miniconda3/envs/py36/bin/pytest

py36) [develop]shill@shillbook:~/Dropbox/py/aospy$ git checkout -b micahkim23-attrs develop
Switched to a new branch 'micahkim23-attrs'
(py36) [micahkim23-attrs]shill@shillbook:~/Dropbox/py/aospy$ git pull https://github.com/micahkim23/aospy.git attrs
remote: Counting objects: 30, done.
remote: Compressing objects: 100% (16/16), done.
remote: Total 30 (delta 19), reused 25 (delta 14), pack-reused 0
Unpacking objects: 100% (30/30), done.
From https://github.com/micahkim23/aospy
 * branch            attrs      -> FETCH_HEAD
Updating 2ef0b9a..4fabe9f
Fast-forward
 aospy/calc.py                 | 29 ++++++++++++++++++++++++++---
 aospy/test/test_calc_basic.py | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------
 docs/whats-new.rst            |  3 +++
 3 files changed, 93 insertions(+), 23 deletions(-)

(py36) [micahkim23-attrs]shill@shillbook:~/Dropbox/py/aospy/ci$ cd aospy/test/
(py36) [micahkim23-attrs]shill@shillbook:~/Dropbox/py/aospy/aospy/test$ which pytest
/Users/shill/Dropbox/miniconda3/envs/py36/bin/pytest
(py36) [micahkim23-attrs]shill@shillbook:~/Dropbox/py/aospy/aospy/test$ pytest test_calc_basic.py
============================================================================ test session starts =============================================================================
platform darwin -- Python 3.6.3, pytest-3.2.3, py-1.5.1, pluggy-0.4.0
rootdir: /Users/shill/Dropbox/py/aospy, inifile: setup.cfg
collected 39 items

test_calc_basic.py FFFFFFFFFFFFFFFFFFFFFFFFFFF............

================================================================================== FAILURES ==================================================================================
_______________________________________________________________________ TestCalcBasic.test_annual_mean _______________________________________________________________________

self = <aospy.test.test_calc_basic.TestCalcBasic testMethod=test_annual_mean>

    def test_annual_mean(self):
        calc_int = CalcInterface(intvl_out='ann',
                                 dtype_out_time='av',
                                 **self.test_params)
        calc = Calc(calc_int)
        calc.compute()
>       _test_files_and_attrs(calc, 'av')

test_calc_basic.py:63:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test_calc_basic.py:37: in _test_files_and_attrs
    _test_output_attrs(calc, dtype_out)
test_calc_basic.py:19: in _test_output_attrs
    with xr.open_dataset(calc.path_out[dtype_out]) as data:
../../../../miniconda3/envs/py36/lib/python3.6/site-packages/xarray/backends/api.py:282: in open_dataset
    autoclose=autoclose)
../../../../miniconda3/envs/py36/lib/python3.6/site-packages/xarray/backends/netCDF4_.py:210: in __init__
    self.ds = opener()
../../../../miniconda3/envs/py36/lib/python3.6/site-packages/xarray/backends/netCDF4_.py:185: in _open_netcdf4_group
    ds = nc4.Dataset(filename, mode=mode, **kwargs)
netCDF4/_netCDF4.pyx:2015: in netCDF4._netCDF4.Dataset.__init__
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   OSError: [Errno -36] NetCDF: Invalid argument: b'/Users/shill/Dropbox/py/aospy/aospy/test/data/objects/test-files/example_proj/example_model/example_run/condensation_rain/condensation_rain.ann.av.from_monthly_ts.example_model.example_run.0004-0006.nc'

netCDF4/_netCDF4.pyx:1636: OSError

...

@spencerkclark
Copy link
Collaborator

@spencerahill many thanks for that detail! Based on my experience, I think now it's actually libnetcdf that's the problem. I was still on libnetcdf version 4.4.1.1 (because of some other package requirements in my environment). When I removed those requirements (and subsequently was able to upgrade libnetcdf to 4.5.0) I was able to reproduce the errors.

It occurred to me that in lieu of finding the root of the issue, we could (just for these tests) open the files using engine='scipy' in our call to xr.open_dataset; this seems to fix the tests in my environment.

@spencerkclark
Copy link
Collaborator

In doing a little more digging, I don't think it has anything to do with the filenames. I have issues opening the files the test suite creates using ncdump in the command line (I get the same error):

$ ncdump -h condensation_rain/condensation_rain.ann.reg.av.from_monthly_ts.example_model.example_run.0004-0006.nc
ncdump: condensation_rain.ann.reg.av.from_monthly_ts.example_model.example_run.0004-0006.nc: NetCDF: Invalid argument

With libnetcdf version 4.4.1.1 things work fine:

ncdump -h condensation_rain.ann.reg.av.from_monthly_ts.example_model.example_run.0004-0006.nc
netcdf condensation_rain.ann.reg.av.from_monthly_ts.example_model.example_run.0004-0006 {
variables:
	double raw_data_start_date ;
		raw_data_start_date:units = "days since 1675-01-01" ;
		raw_data_start_date:long_name = "time axis boundaries" ;
		raw_data_start_date:calendar = "NOLEAP" ;
		raw_data_start_date:_FillValue = NaN ;
	int subset_end_date ;
		subset_end_date:units = "days since 1680-12-31 00:00:00" ;
		subset_end_date:calendar = "proleptic_gregorian" ;
	double raw_data_end_date ;
		raw_data_end_date:units = "days since 1675-01-01" ;
		raw_data_end_date:long_name = "time axis boundaries" ;
		raw_data_end_date:calendar = "NOLEAP" ;
		raw_data_end_date:_FillValue = NaN ;
	double sahel ;
		sahel:units = "" ;
		sahel:_FillValue = NaN ;
		sahel:description = "condensation rain" ;
	int subset_start_date ;
		subset_start_date:units = "days since 1678-01-01 00:00:00" ;
		subset_start_date:calendar = "proleptic_gregorian" ;

// global attributes:
		:coordinates = "raw_data_start_date subset_end_date raw_data_end_date subset_start_date" ;
}

@spencerkclark
Copy link
Collaborator

Given their dependency setup, it appears that xarray's CI has not been hit by libnetcdf version 4.5.0 yet (this was inferred by glancing through their build logs). Sure enough, when I run their test suite under the latest libnetcdf I get some failures (so I don't think we are the only ones):

__________ GenericNetCDFDataTest.test_cross_engine_read_write_netcdf3 __________

self = <xarray.tests.test_backends.GenericNetCDFDataTest testMethod=test_cross_engine_read_write_netcdf3>

    def test_cross_engine_read_write_netcdf3(self):
        data = create_test_data()
        valid_engines = set()
        if has_netCDF4:
            valid_engines.add('netcdf4')
        if has_scipy:
            valid_engines.add('scipy')

        for write_engine in valid_engines:
            for format in ['NETCDF3_CLASSIC', 'NETCDF3_64BIT']:
                with create_tmp_file() as tmp_file:
                    data.to_netcdf(tmp_file, format=format,
                                   engine=write_engine)
                    for read_engine in valid_engines:
                        with open_dataset(tmp_file,
>                                         engine=read_engine) as actual:

tests/test_backends.py:1016:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
backends/api.py:282: in open_dataset
    autoclose=autoclose)
backends/netCDF4_.py:210: in __init__
    self.ds = opener()
backends/netCDF4_.py:185: in _open_netcdf4_group
    ds = nc4.Dataset(filename, mode=mode, **kwargs)
netCDF4/_netCDF4.pyx:2015: in netCDF4._netCDF4.Dataset.__init__
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   IOError: [Errno -36] NetCDF: Invalid argument: '/var/folders/qn/0m4n89p48xl7t6006s6fdp6h0000gn/T/tmpPjiV4A/temp-91.nc'

netCDF4/_netCDF4.pyx:1636: IOError
___ GenericNetCDFDataTestAutocloseTrue.test_cross_engine_read_write_netcdf3 ____

self = <xarray.tests.test_backends.GenericNetCDFDataTestAutocloseTrue testMethod=test_cross_engine_read_write_netcdf3>

    def test_cross_engine_read_write_netcdf3(self):
        data = create_test_data()
        valid_engines = set()
        if has_netCDF4:
            valid_engines.add('netcdf4')
        if has_scipy:
            valid_engines.add('scipy')

        for write_engine in valid_engines:
            for format in ['NETCDF3_CLASSIC', 'NETCDF3_64BIT']:
                with create_tmp_file() as tmp_file:
                    data.to_netcdf(tmp_file, format=format,
                                   engine=write_engine)
                    for read_engine in valid_engines:
                        with open_dataset(tmp_file,
>                                         engine=read_engine) as actual:

tests/test_backends.py:1016:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
backends/api.py:282: in open_dataset
    autoclose=autoclose)
backends/netCDF4_.py:210: in __init__
    self.ds = opener()
backends/netCDF4_.py:185: in _open_netcdf4_group
    ds = nc4.Dataset(filename, mode=mode, **kwargs)
netCDF4/_netCDF4.pyx:2015: in netCDF4._netCDF4.Dataset.__init__
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   IOError: [Errno -36] NetCDF: Invalid argument: '/var/folders/qn/0m4n89p48xl7t6006s6fdp6h0000gn/T/tmpCAfrfg/temp-96.nc'

netCDF4/_netCDF4.pyx:1636: IOError
== 2 failed, 1904 passed, 536 skipped, 6 xfailed, 15 xpassed in 94.18 seconds ==

@spencerahill
Copy link
Owner

@spencerkclark nice work! Yes, I'd say you've pinpointed libnetcdf as the problem. I will pass word to xarray and netCDF4.

In terms of the CI, do you think we should pin the libnetcdf version, or live with the test failures for now, or ...?

@spencerkclark
Copy link
Collaborator

I will pass word to xarray and netCDF4.

Thanks for doing that; we'll see what they suspect the problem might be.

In terms of the CI, do you think we should pin the libnetcdf version, or live with the test failures for now, or ...?

Hopefully the issues you've posted will spark discussion upstream, so I'd be fine pinning libnetcdf to version 4.4 so our test suite passes. We can unpin it once those issues are diagnosed/resolved.

@spencerahill
Copy link
Owner

I'd be fine pinning libnetcdf to version 4.4 so our test suite passes. We can unpin it once those issues are diagnosed/resolved.

Sounds good to me. So @micahkim23, please walk back our pinning of netCDF4 and instead pin libnetcdf, in all our test environments.

@spencerkclark
Copy link
Collaborator

@spencerahill @micahkim23 based on discussion upstream, I think a (perhaps better?) alternative fix would be to save all our files using the 'netcdf4' engine going forward (rather than the 'scipy' engine). That seems to be where the issues stem from.

Due to #69 (i.e. the ncview tool installed on many clusters is old and doesn't support newer netCDF formats), we would probably want to specify one of the older formats (e.g. format='NETCDF3_64BIT' when saving as well), but that's not a big deal (because that's what the 'scipy' engine was doing anyway).

So the suggestion is to change:

data_out.to_netcdf(path, engine='scipy')

To:

data_out.to_netcdf(path, engine='netcdf4', format='NETCDF3_64BIT') 

Maybe we should have been doing that all along...

@spencerahill
Copy link
Owner

@spencerkclark thanks for engaging with the netcdf folks on this. I agree with your suggestion. Much better than pinning the libnetcdf version for our tests.

Once we do this, that means scipy is no longer a dependency. So when we implement this, we should also remove scipy from all our environments and setup.py.

I'll go ahead and submit a PR on this shortly

@spencerahill
Copy link
Owner

spencerahill commented Nov 17, 2017

@micahkim23 this means that you can now disregard my previous about pinning libnetcdf (or netcdf...no pinning required)

@micahkim23
Copy link
Collaborator Author

micahkim23 commented Nov 20, 2017

in examples.rst there was a typo, so the example wasn't rendering correctly in the docs. calcs[0].data_out should display an example that includes units and description in the DataArray object.

@spencerahill
Copy link
Owner

in examples.rst there was a typo, so the example wasn't rendering correctly in the docs.

Woops! Nice catch.

One final request on this: can you add a sentence immediately after that ipython block in examples.rst making note of this metadata. I.e. something like "Notice that the variable's name and description have been copied to the resulting Dataset (and hence also to the netCDF file saved to disk. This enables you to better understand what the physical quantity is, even if you don't have the original Var definition on hand."

@spencerkclark, if you wouldn't mind doing one last review also, then I'll merge.

Copy link
Collaborator

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@micahkim23 one final minor thing and then I think we're good to merge! Thanks for your patience as we sorted out what was going on with the CI.

Notice that the variable's name and description have been copied
to the resulting Dataset (and hence also to the netCDF file saved
to disk). This enables you to better understand what the physical
quantity is, even if you don't have the original Var definition
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make Var a literal here (i.e. surround it by double back ticks).

@spencerkclark spencerkclark changed the title WIP add units and description to output netcdf files Add units and description to output netcdf files Nov 20, 2017
@@ -404,7 +404,7 @@ and the results of each output type
Notice that the variable's name and description have been copied
to the resulting Dataset (and hence also to the netCDF file saved
to disk). This enables you to better understand what the physical
quantity is, even if you don't have the original Var definition
quantity is, even if you don't have the original `Var` definition
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double back ticks :)

 ``Var``

@spencerkclark
Copy link
Collaborator

@spencerahill feel free to merge when you're ready!

@spencerahill
Copy link
Owner

In it goes! Thanks much @micahkim23

@spencerahill spencerahill merged commit 82af0ba into spencerahill:develop Nov 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants