-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relax logic for comparing grid attributes between Run and Model objs #199
Relax logic for comparing grid attributes between Run and Model objs #199
Conversation
Fixes spencerahill#187 Previously an error would be raised if a grid attribute found in a Run object did not exist in the Run's corresponding Model. This allows aospy to use grid attributes found only in Run objects.
Gotta love a one-line fix! Any idea what's going on in the tests with the Notebook? Otherwise this looks good to me. |
@spencerahill I need to run now, but it seems that the notebook exposed a valid use-case that was problematic. My commit fixes things, but I don't think it is the ideal solution. I'll try and lay out the problem in more detail tomorrow so we can discuss the options. |
Sounds good. No rush. |
The notebook sets up some ...
from aospy.data_loader import DictDataLoader
file_map = {'monthly': os.path.join(rootdir, '000[4-6]0101.precip_monthly.nc')}
data_loader = DictDataLoader(file_map)
from aospy import Run
example_run = Run(
name='example_run',
description='Control simulation of the idealized moist model',
data_loader=data_loader
)
from aospy import Model
example_model = Model(
name='example_model',
grid_file_paths=(
os.path.join(rootdir, '00040101.precip_monthly.nc'),
os.path.join(rootdir, 'im.landmask.nc')
),
runs=[example_run] # only one Run in our case, but could be more
)
... As of this PR each
The issue stems from how concatenation proceeds using Lines 185 to 190 in ce7c784
When the three files are concatenated together using
Note that three files are specified in the Lines 282 to 295 in ce7c784
For some reason,
So when assignment is attempted in line 290, it fails, because the arrays have different shapes. The workaround I've used now is to replace This workaround is OK, but it does not address the underlying issue, which is that
My impression is that the ideal solution would be if we could provide an argument like |
Thanks for digging into this.
I stumbled upon this myself in #181 (although I'm not sure I made a written note of it). This sure seems like a bug to me: the allclose documentation only gives examples of same-shaped arrays, and there's an open issue at Numpy about basically the same thing. But it's >2 years old with no movement, so we might consider giving it a ping. Here's the MWE I came up with; including version to show it still exists on latest numpy release: In [2]: np.version.version
Out[2]: '1.13.1'
In [5]: np.allclose(np.ones([5, 1]), np.ones([1]))
Out[5]: True And there's some further odd behavior: In [7]: np.allclose(np.ones([5, 1]), np.ones([5, 2]))
Out[7]: True
In [8]: np.allclose(np.ones([5, 1]), np.ones([2]))
Out[8]: True
In [9]: np.allclose(np.ones([5, 1]), np.ones([]))
Out[9]: True I would expect all of these to raise a In [12]: np.allclose(np.ones([5, 1]), np.ones([12, 34]))
...
ValueError: operands could not be broadcast together with shapes (5,1) (12,34)
I think that's a good decision even if there weren't this bug; we should strive to use xarray versions whenever possible (which we mostly already do anyways).
Is there a way to use the |
Since |
Probably worth at least asking at the xarray repo or mailing list before rolling our own. I suspect others have/will come across the same problem. |
It turns out that if variables are set as coordinates before concatenation in xr.open_mfdataset then they are not broadcast along the concat_dim. This solves the underlying issue (so we can optionally revert to the orginal logic in calc.py)
@spencerahill I did a little more investigating -- it turns out we can obtain the desired behavior if we set grid attributes as coordinates before concatenating in Concatenation with two variables (original behavior):for i in range(3):
da = xr.DataArray(np.ones((2, 3)),
dims=['x', 't'],
coords=[np.arange(2), np.arange(3 * i, 3 * (i + 1))],
name='a')
coord = xr.DataArray(2. * np.ones(2), dims=['x'], coords=[np.arange(2)], name='area')
ds = da.to_dataset()
ds['area'] = coord
ds.to_netcdf('example-case-{:d}.nc'.format(i))
Concatenation with a non-index coordinate (desired / current behavior):for i in range(3):
da = xr.DataArray(np.ones((2, 3)),
dims=['x', 't'],
coords=[np.arange(2), np.arange(3 * i, 3 * (i + 1))],
name='a')
coord = xr.DataArray(2. * np.ones(2), dims=['x'], coords=[np.arange(2)], name='area')
da['area'] = coord
da.to_netcdf('example-case-{:d}.nc'.format(i))
Setting the time attributes as coordinates before calling |
aospy/data_loader.py
Outdated
@@ -91,7 +92,11 @@ def set_grid_attrs_as_coords(ds): | |||
Dataset | |||
Dataset with grid attributes set as coordinates | |||
""" | |||
int_names = GRID_ATTRS.keys() | |||
if not set_time_vars: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slight preference to use the affirmative, i.e. if set_time_vars:
as the main if
statement, with the else
corresponding to not set_time_Vars
Thanks @spencerkclark, nice find of a pretty simple workaround. Just confirming, the only difference between the two examples is: ds = da.to_dataset()
ds['area'] = coord
ds.to_netcdf('example-case-{:d}.nc'.format(i)) versus da['area'] = coord
da.to_netcdf('example-case-{:d}.nc'.format(i)) I don't fully understand why this results in the behavior that it does...is this intuitive to you? (I.e. should xarray consider tweaking or clarifying?)
It actually looks pretty tight already to me, although as I mentioned I'm not seeing it 100% clearly. Need some tests of course; those might actually help me understand also. |
@spencerahill yes, that is exactly the difference; it seems that
No, this isn't super intuitive to me either; however, it is consistent with the default behavior of |
Also FYI it turns out that the |
Yes, that sounds like a good approach. |
That thread makes sense; thanks for double checking with the numpy folks on that.
Remarkably it seems someone else is keen on implementing the same thing: pydata/xarray#1580. |
Haha go figure! If that makes it into xarray 0.10, we have the option of waiting for that and then bumping our required xarray version to 0.10. Conversely, we can continue with this PR using your workaround. What do you think? |
I'll go ahead and continue with this PR. If the change does make into xarray 0.10 then it's a simple change. (This way we'll be prepared even if it doesn't). |
Great, in that case just ping me when you're ready for another review |
@spencerahill great, this should be ready for another review when you get a chance. |
Looks good! Thanks @spencerkclark |
pydata/xarray#1580 has now been merged, although xarray v0.10 in which it will appear has not yet been released. So if/when we come across something like this again, we should use the new xarray functionality and go ahead and bump the minimum requirement of xarray to >=0.10. |
Fixes #187
The way I tested this was by adding a grid attribute (
'zsurf'
) of all zeros to the files used in some testRun
objects:00040101.precip_monthly.nc
00050101.precip_monthly.nc
00060101.precip_monthly.nc
but leaving it off the files used in constructing their
Model
object:00060101.sphum_monthly.nc
Tests do not pass without the one line change I add in
calc.py
.