-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Selecting datetimes within a certain range (days) without having to specify the start and end dates precisely (hours or higher) #243
Comments
@chuaxr right now this is not possible in But, the way date ranges are currently handled in |
Thanks for the quick reply @spencerkclark. In the meantime, is there a workaround that I could use to specify the start time for the 5 minute file? In this case the file exactly contains my desired time. There seems to be an option to do that in utils/times.py which has to do with a raw_data attribute, although it's not obvious how I could switch it on. |
My apologies, I think I may have misunderstood your issue. This is quite a severe bind! Outside of the changing the source code of
Yes, fundamentally I'm pretty sure the issue stems from the decoding of the datetimes from your WRF run. The underlying raw time series (encoded as a series of floats representing some unit of time since a certain date) must not be encoded to enough precision to be precisely decoded into datetimes to nanosecond precision. It is clear that having the strict check advised by #72 is not always desirable. I think it would be beneficial to have this as an option in each DataLoader (by default the strict check would be on) where one could relax it to allow for edge cases like this. This would essentially just be a flag to say whether If you're in need of a really quick fix, you could just comment out line 451 in |
Perhaps a better fix would be to add a tolerance to the strict check. That is change: range_exists = start_date >= da_start and end_date <= da_end in range_exists = start_date >= (da_start - tol) and end_date <= (da_end + tol) where |
Thanks for the tips. Your suggestion (see below) did work, and I also realized that casting the time coordinate to float in the preprocessing step also addressed the issue.
Although I am now confused about whether it is possible to specify more than one date range for a Run with two different output frequencies. Ideally, I would like the partial datetime indexing solution. I also would prefer the ability to specify different ranges that depend on the time frequency (e.g. start at 5 minutes for the 5min output and start at 1 hour for the 1hr output) than manually toggling the date range for each calculation (which is what I am currently doing). Could that be specified in the data loader? |
We'll need to think carefully about how partial string indexing would fit in with this check, but that is something that I think would be a huge benefit for a number of use cases (including this one). That said, that's a ways down the road. As a quick fix, I think you should just be able to increase the tolerance (maybe to something like a day). Then you should be able to specify a range of |
Just to be clear, this check has nothing to do with what time range actually gets selected in |
@chuaxr reading this again I'm a little unsure if I interpreted things properly. Could you be a little more specific about what you mean here? What do you mean by "specify more than one date range for a Run with two different output frequencies?" Obviously you can only have one default date range per Run, so I'm assuming you mean in the main script? Or are you saying you'd like to be able to specify a single default date range in the Run that worked for both output frequencies? Could you post a code snippet of the constructor for the Run you are referring to? Thanks! |
I generally compute statistics over the last ten days of output. Ideally, I would just need to specify up to the precision of the day when creating the datetime object for the start and end dates. However, when dealing with hourly output, I needed to specify up to the hour of the starting date:
default_start_date=datetime.datetime(2001,
7, 20,1),I started dealing with high frequency (5 min) output, which meant that specifying a default start date based on the hourly data would omit some of the data from the high frequency run. For some reason, there are some rounding errors which make it impossible to specify the start time precisely with datetime (which stops at microsecond precision). (See error message below, although specifying from datetime.datetime(2001,6,30,1,0,0) to datetime.datetime(2001,7,10,0,0,0) actually works.)
Is it possible to select values within a range, rather than having to specify the start and end dates precisely?
The text was updated successfully, but these errors were encountered: