Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create abstract VertCoord class that all vertical coordinates inherit from #80

Open
spencerahill opened this issue Jul 5, 2016 · 7 comments
Labels

Comments

@spencerahill
Copy link
Owner

(copied from offline convo w/ me and @spencerkclark in May)

Maybe it makes sense to create an abstract class or interface for vertical coordinates...it would have attributes and/or methods for layer interface values and layer center values. Then users could specify their own coordinates, and aospy can use them so long as they adhere to the interface. I'm sure there are other coordinate systems in use out there, especially on the ocean (or non-atmosphere more broadly) side than the pressure and hybrid sigma-pressure ones we use. And that would contribute to our broader effort to refactor the codebase to be more modular.

infinite-diff already tries to do something similar. And Iris has something of this sort implemented.

@spencerahill
Copy link
Owner Author

The need for something like this, or more generally a re-vamping of how we handle vertical coordinates, is now more acute, due to new use cases from different models, each with their own idiosyncrasies. See #184 for the WRF model and #191 for the CAM model.

Another use case is sigma coordinates, i.e. pressure divided by surface pressure (confusingly we use the 'sigma' label to refer to hybrid sigma-pressure coordinates). @spencerkclark it just occurred to me that the idealized models use this, including the test data we package with aospy! How are we handling these coordinates right now?

We also need to provide a public API for users to define and supply their own vertical coordinate systems, so that they don't find themselves unable to use aospy if their data uses a vertical coordinate system that aospy doesn't support.

What would this look like? The only things that I think are ultimately needed for each coordinate type are methods for computing the pressure at level interfaces and level centers. (Although that pre-supposes that the vertical coordinate is ultimately pressure-related, which isn't the case for e.g. ocean models or atmospheric models that use a z coordinate).

@spencerkclark
Copy link
Collaborator

How are we handling these coordinates right now?

The FMS idealized models still output bk and pk variables; it just happens that pk is zero for all levels (so in that sense the pressure can be treated as being in hybrid coordinates since sigma coordinates are just a limiting case). I agree this is confusing though.

@spencerahill
Copy link
Owner Author

The FMS idealized models still output bk and pk variables; it just happens that pk is zero for all levels (so in that sense the pressure can be treated as being in hybrid coordinates since sigma coordinates are just a limiting case).

I see, that's a nice design. Unfortunately, this doesn't hold more generally. The idealized models at Caltech (despite being FMS-based, albeit a much older version) that I'm using these days don't use bk or pk; they have simply a 'sigma' coordinate:

ncdump -h day2000h00.nc
netcdf day2000h00 {
dimensions:
	lat = 64 ;
	sigma = 30 ;
	theta = 100 ;
	zon_waven = 43 ;
	legendre = 43 ;
	rhum_bin = 41 ;
	times = 1 ;
variables:
...
	double sigma(sigma) ;
		sigma:long_name = "sigma level" ;
		sigma:units = "none" ;
...
double u(sigma, lat) ;
		u:long_name = "Zonal Wind" ;
		u:units = "m/s" ;
...

So we do need to ultimately handle pure sigma.

@spencerkclark
Copy link
Collaborator

Yeah, I totally agree that what's in aospy now is not close to general enough.

The only things that I think are ultimately needed for each coordinate type are methods for computing the pressure at level interfaces and level centers.

Could a starting point simply be the paradigm that's already used for all other kinds of computed variables (i.e. user-defined Var objects that point to functions that take model-native variables as inputs)? Then maybe in the main script one would be asked to specify Var objects associated with the vertical coordinate midpoints and interfaces respectively (instead of a string-based identifier for the system)?

@spencerahill
Copy link
Owner Author

Sorry, I'm not sure I understand what you mean. I'll provide a little more context below; hopefully that helps (me as much as you).

The original motivation for our current approach, which is to pre-compute as necessary pressure fields before passing them to a variable's computation function, was to enable users to write more generic functions, i.e. ones that don't require knowledge of the coordinate type. So for potential temperature, only one function would need to be defined, with one of its arguments being a (potentially spatiotemporally varying) pressure field.

One motivation for specifying the vertical data type was that sometimes I had the same quantities both pressure-interpolated and on the model native coordinates. I needed to compare how accurate things were in either coordinate system. Also, for the GFDL netCDF directory structure and file naming, the vertical datatype determines the file name and path: "atmos" for interpolated vs. "atmos_level" for model-native.

But I admit that's not a super common use-case; I would be receptive to that being optional and having aospy look for one or the other by default.

@spencerkclark
Copy link
Collaborator

Sorry I realize what I wrote is vague.

I was thinking something along the lines of replacing dtype_in_vert in the main script with two arguments:

  • vert_coord_interfaces
  • vert_coord_midpoints

Each of these would accept aospy.Var objects, which would point to functions that take model native variables as input (e.g. ps, bk and pk, ps and sigma, or level etc.). One could give them a name attribute specifying the type of vertical coordinate they were (e.g. 'p_interfaces_sigma'/'p_midpoints_sigma', or 'p_interfaces_hybrid'/'p_midpoints_hybrid'); these name attributes could be used in file-naming conventions to distinguish between results computed using variables on different vertical coordinate systems and also in loading data to construct file paths.

We already have the machinery to load the data necessary for and compute Var objects within DataLoader/Calc, so this kind of setup feels fairly natural. But I see your point that when writing functions for other variables (e.g. potential temperature) that depend on the values of the vertical coordinate that there is a slight distinction. While somewhat like #3, the situation isn't quite the same. Indeed something like potential temperature is being computed from an argument (pressure) that is computed, but if we want to keep that generic to the coordinate system, in our object libraries we would need to pass it some sort of special kind of variable that didn't actually point to a specific Var object. Instead it would automatically get overridden by whatever Var object was specified in the user's main script for vert_coord_interfaces or vert_coord_midpoints.

Currently the way things work if one wants to use pressure in a computation of another variable, is that in one's object library, one needs to use a Var object for something named p with no specified calculation function/signature (effectively it is treated as a model-native variable, even though it often isn't). Then within calc.py if the variable name attribute is p it goes through a special pathway of computation which implicitly depends on dtype_in_vert. So in a sense this special treatment of a Var object is already done, but in reading the code over again, the logic for this right now within calc.py is pretty messy and I think is overdue for a re-write. (As a side note I don't know how well that current behavior is documented in our examples or elsewhere in the documentation).

Does that make a little more sense? It would be nice if we could avoid writing a new object type that the user would need to think about (though I'm not necessarily against writing a new class internally to handle these things).

@spencerahill
Copy link
Owner Author

Sorry for taking a while on this. You raise some good points. I still need to think through all this; don't see one obvious best way forward yet. So mostly just more questions for now.

these name attributes could be used in file-naming conventions to distinguish between results computed using variables on different vertical coordinate systems and also in loading data to construct file paths.

Would we require them to have these names, or could a user potentially name them something different? For the data loading (c.f. the GFDL example), how would one translate from these names to constructing the needed path?

Currently the way things work if one wants to use pressure in a computation of another variable, is that in one's object library, one needs to use a Var object for something named p with no specified calculation function/signature (effectively it is treated as a model-native variable, even though it often isn't). Then within calc.py if the variable name attribute is p it goes through a special pathway of computation which implicitly depends on dtype_in_vert. So in a sense this special treatment of a Var object is already done

This is a good point. Even if we stay with the existing dtype_in_vert approach, there's definitely room for improvement here.

the logic for this right now within calc.py is pretty messy and I think is overdue for a re-write.

Agreed

As a side note I don't know how well that current behavior is documented in our examples or elsewhere in the documentation

Good point, we should add material on this to the docs once we converge on a solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants