-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support sub-monthly time reduction intervals #204
Comments
Thanks for the report. We should definitely support writing to sub-monthly periods. (You can tell from this example that we wrote aospy when our research was focused on longer timescales! 😝 )
Strictly speaking, this only occurs if they are within the same year and month -- e.g. 5 Aug-10 Aug would not be overwritten by 10 Sept-15 Sept.
We can make it adaptive so that it includes the minimum needed precision but no more, without the user having to specify anything. We already do this for the year, as you noted: YYYY if the dates span only a single year, rather than YYYY-YYYY. The code snippet you provided is a nice starting point. We'll need to iterate though on what's the best way to represent more general dates. It occurs to me that our current method, with letters for annual means or multiple months (e.g. 'ann', 'djf', 'jas') but numbers for single months ('01' for January, etc.) separated from the year range by a '.' doesn't extend easily to shorter timescales. @spencerkclark has been thinking a lot in the past year about date/time representations, so I'll let him chime in before proceeding. In the meantime, we should add a Note or Warning in the docs about this (or maybe a whole section/sub-section about using aospy with higher frequency data). Also, there are likely other places where we have implicitly assumed a monthly or longer period for everything, so please keep reporting anything else along those lines. Thanks! |
Thanks for taking the time to write up this issue @chuaxr! @spencerahill it seems we have differing views on this :) Maybe I'm missing something important (examples illustrating your concerns might help), but I don't see any major issues with @chuaxr's suggestion.
If I understand the situation properly, I think @chuaxr was correct initially. I think the only way the start and end date for each calculation are currently encoded in file names is by the year of each. E.g. a calculation with a start date of 0003-01-01 and an end date of 0006-12-31 would have a file name of something like:
We could think of making the precision adaptive;
But it is sort of hard to reverse engineer that from full-resolution datetimes without some assumptions (and the logic could get messy), which is what we would need to do at the moment. (In pandas the resolution is determined by how many digits are provided in the string-specification; right now in
I don't see a huge issue regarding conflict between the For those reasons I feel I would not be opposed to extending things out to daily resolution in the filenames for the start and end dates (in all circumstances to keep the logic simple). So my example above would look something like: |
While I don't think I need it right now, it's not inconceivable that a user might need hourly (or even higher) resolution some day (e.g. tracking the formation of a particular storm event). I would therefore support a user-input time-format string. I was able to pass a time_format_str argument (e.g. '%Y%m') from calc_suite_specs in aospy_main.py by adding time_format_str to _AUX_SPEC_NAMES and _NAMES_SUITE_TO_CALC in automate.py and CalcInterface in calc.py. Maybe this is something you'd like to implement? |
Thanks both for your thoughts. I realize now I'm confused about the use case. @chuaxr, can you clarify:
If it's for a single year, then that's where our logic gets a bit odd: if the date range is exactly equal to Aug 5-15, then either Apologies if I'm misunderstanding still. I need us to sort these things out before being able to think clearly about the rest of @spencerkclark's thoughts. |
@spencerahill I'm only averaging within one year. Here's a concrete example that triggers the overwriting: date_ranges=[(datetime.datetime(2001,8,19,1),datetime.datetime(2001,8, 29,0)),\
(datetime.datetime(2001,8,9,1),datetime.datetime(2001,8,19,0)) ]
output_time_intervals=['ann'], |
@chuaxr just to give a little more background to your use-case, there's also no seasonal cycle in insolation in your model, correct? (So years or months don't hold any specific significance?) |
Yup, the specific dates are only chosen for consistency with the arbitrary
dates I used in the wrf run, and have no physical significance otherwise.
…On Saturday, 16 September 2017, Spencer Clark ***@***.***> wrote:
@chuaxr <https://github.com/chuaxr> just to give a little more background
to your use-case, there's also no seasonal cycle in insolation in your
model, correct? (So years or months don't hold any specific significance?)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#204 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/Acnf5qs1mtPFG3LJrfxeHB6cFdAQMU1Wks5sjBRJgaJpZM4PSAR7>
.
|
Thanks @chuaxr, that's helpful. Now that we have our bearings, I think the issue is ultimately just that we don't yet support sub-monthly calc_suite_specs = dict(output_time_intervals=[('08-09', '08-19'), ('08-19', '08-29')], ...) where each tuple corresponds to one of your desired averaging periods. Note that this would work whether your data spanned one year or less or multiple years; the number of years averaged over would be specified via The resulting filename would be @spencerkclark our thoughts on this seem to have diverged. What is your take? I just keep coming back to feeling that we shouldn't recommend using |
@spencerahill I see your point regarding Consider the edge case that @chuaxr might be interested in taking the mean over days 360 to 380 of a simulation. The first six days would be in year one, while the last fifteen would be in year two. If we went with either of the solutions proposed (@chuaxr's or yours), how might we support that? Sticking with Does that concern make sense? This would probably be best supported by a new time reduction pathway (to add to |
I totally agree. Our pipeline inherently revolves around taking averages relative to calendar years: first within them, then across them. There's no way at present to support an average of a date range that spans across calendar years, and I think you're right that a new time reduction is likely needed. Unfortunately, my gut tells me that will be a big task. Also, that seems to me a different issue than the one @chuaxr is currently facing, that of a sub-monthly period that doesn't cross calendar years. For those reasons, I'm inclined to focus on the sub-monthly issue first. I'll open an issue to track the across-year problem, and we can turn in this thread to the sub-monthly support. |
By default, the output file from an aospy calculation is named with "...start_year-end_year.nc" or simply "start_year.nc". This means that using date ranges within the same year (e.g. 5 Aug-10 Aug, 10Aug -15 Aug) will result in only one .nc file instead of two.
As a workaround, I currently have the following changes in _file_name in calc.py:
Presumably this level of precision would not be necessary for users averaging over years of monthly averaged data, so perhaps this should be an option for users to turn on/off.
The text was updated successfully, but these errors were encountered: