-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up main.py #2
Comments
I am definitely on board with this.
What's the motivation here? Performance? Even with all of the parameter combinations we're iterating over in its current form, is the number of rows even large enough to meaningfully test that? I ask because my intuition is to do the opposite: create the minimal number of aospy objects and parameter combinations necessary to test all of the desired functionality.
I agree. aospy.plotting tries to do this, but in a very confusing/hard-to-maintain way (not to mention poorly documented) that I wouldn't necessarily want to emulate here. Basically it uses nested lists to signify at what level within the plotting hierarchy (figure, axes within the figure, plot within one axes set, data needed for a plot) to apply the given parameter. Perhaps the best entry point is to ask: how would the user specify a set of calculations like this, wherein it is not just a permutation of all the specified parameter options? I think we could create some simple container classes that could be used to somehow specify how a parameter option is meant to be interpreted. E.g. |
Testing performance wasn't really what I had in mind here. More of what I was interested in was doing the same calculation(s) for a number of Runs within a Model. My bad for phrasing this as a "fair amount of iteration."
For sure, even now I probably have too many rows in the database for what we really need for testing purposes. I think before I add any more functionality I'll work to clean up the main script and set up some simple formal tests.
This is a big question from an API standpoint with many sub-questions (e.g. spencerahill/aospy#9; spencerahill/aospy#30; what about inter-Run calculations?). At the moment it's hard for me to say what could work best. I think you're on to something with using some container classes, but once I go through the process of distilling the current main script to its minimal elements, maybe I'll have a clearer perspective. |
Sounds good. Strictly speaking, in terms of testing that iteration is working, even just two of each object type is sufficient. (But that might not be enough to test more involved queries that you have mentioned). Sorry I can't help more right now with the main script cleanup -- frankly you're stuck cleaning up the mess I created there (and it is definitely a mess). |
This too could potentially work through the container classes, e.g. (And it should be generalized beyond runs: data should be allowed to come from different Models and potentially even Projs.) |
Starting a little further back, how tied to the I want to say that we could do this in such a way that it wouldn't change much from the user's perspective, but it could change some of these back-end processes (and possibly reduce the amount of code), but I'd need to try it first. Is there any reason you think this would be a bad idea? I might give it a shot in a separate branch. |
Not at all -- they feel crude to me, and I've kinda always wanted to do something better there.
I actually think there is a danger here of too tightly coupling the DB and rest of the functionality. I think the aospy classes are already doing too much (Calc especially) and need to be made more modular. For example, from a quick glance, it looks like you've included some interaction with the DB within the init method of Model, which causes Model to be tightly coupled to the DB. What if someone wants to create a standalone Model, e.g. for exploratory or testing purposes, without dealing with a DB? Also, if we want to potentially support different data management backends based on user preference down the line, this is also problematic. So that is to say, we need to come up with a clean interface between the aospy objects and their database counterparts. At the very least, the DB-related code that is now inside the aospy objects should be sectioned off into separate methods, with those methods being called e.g. by init based on a kwarg. Does that make sense? This is still extremely useful, just trying to avoid getting ahead of ourselves and inadvertently creating problems for ourselves down the line -- much of the difficulties you're dealing with now are because of ill-advised design decisions I made previously. Thanks! |
I hear you. I'll think about how to abstract things a little more. It was just bugging me that we need to provide the same attributes twice in two classes, but I guess it's something we'll have to live with. I was hoping there might be some sort of object inheritance functionality we could leverage,
but if we want to support objects disconnected from a database (or simply not running with a DB period, which I guess should also remain an option) I'm not sure if that would be possible. |
This requires that the SQLAlchemy objects returned by queries be transformed into some generic format before being returned to |
Getting back to the primary motivation of this issue, pending slight changes to the import statements due to package re-structuring (which I think I'll tackle next), here is a minimal example for a from aospy_synthetic.calc import Calc, CalcInterface
import cases
import models
import example_projects
import variables
test = CalcInterface(proj=example_projects.example,
model=models.am2,
run=cases.am2_control,
var=variables.mse,
date_range=('0021-01-01', '0080-12-31'),
intvl_in='monthly',
intvl_out='son',
dtype_in_time='ts',
dtype_in_vert='sigma',
dtype_out_time='avg',
dtype_out_vert=False,
level=False)
calc = Calc(test)
calc.compute() Let me know if you have any comments on this new |
Yes. I don't think it's necessary to add an additional layer of abstraction in terms of objects between the DB and aospy. Instead, the additional abstraction needs to be in terms of classes that handles the interaction between the DB and aospy. In the direction of aospy to the DB, this would involve properly storing the aospy object within the DB. In the opposite direction, this would involve converting the specification for desired data (however it is provided) into a query that ultimately returns that data. (I'm not sure whether each direction should be its own class or there be one class that handles both directions.)
Good point. Right now, the datatypes aren't symmetric: we save the results of Calcs as netCDF files and then read them back simply as xarray objects. Granted, we save some additional metadata within the xarray objects (part of what makes them so useful), but there is no conversion back into a Calc (which, as you note, isn't necessarily 1-to-1).
I like this and had thought a little about it in the past (although I had forgotten it until just now): spencerahill/aospy#4 and spencerahill/aospy#33. Take a look at those, the latter especially, and see what you think. |
Yes, this looks really clean -- feel free to close and we can continue the discussion within aospy...maybe spencerahill/aospy#33 is the most logical place. |
Great! Yes, I had forgotten about spencerahill/aospy#33 as well. I agree, let's move further discussions over there. |
Similar to @spencerahill's comments in a pull request in aospy I think it would be valuable to restructure the
main.py
script here to have as little of the parsing logic as possible and fairly explicitly create all the computations. We'd still want a fair amount of iteration to take place to populate the database with a fairly large number of rows, but the waymain.py
is currently set up is a bit opaque and more importantly it is not possible to simulate multiple iterations of combinations of parameters with one execution of the script:e.g. if we want to fill the database with
jas
average data for three variables, but then only dodjf
averages for one of those variables. It is important to be able to do this to set up interesting tests for the queries. Right now this can be accomplished by runningmain.py
once with one set of parameters, editing it, and running again, but it would be helpful if we could do this all in one go.The text was updated successfully, but these errors were encountered: