Add 'one_dir_freq' file read-in option #31

spencerkclark · 2015-11-05T20:44:10Z

I recognize that we ultimately want to refactor this process out of calc.py, but I needed something along these lines for dealing with idealized model output. The option I've added is called 'one_dir_freq'. It is a very slight modification of the existing 'one_dir' option; however in this case I leverage the intvl_in attribute of Calc, much like the 'gfdl' option to enable the user to specify a series of files with different output frequencies (e.g. monthly, daily, 3hr etc.).

Here's how one uses it within a Run object:

control_T42 = Run(
    name='control_T42',
    description=(
        'Control case at T42 spectral resolution'
        ),
    data_in_direc='path/to/files',
    default_date_range=(start, end),
    data_in_dir_struc='one_dir_freq',
    data_in_files={'20-day': {v: '00000.1x20days.nc' for v in variables},
                   'daily': {v: '00000.1xday.nc' for v in variables},
                   '3-hourly': {v: '00000.8xday.nc' for v in variables}},
    idealized=True
)

…one_dir_freq

spencerahill · 2015-11-06T14:01:30Z

aospy/calc.py

+        # data_in_files may hold absolute or relative paths
+        paths = []
+        for nc in data_in_files:
+            full = '/'.join([data_in_direc, nc]).replace('//', '/')


The best way to do this is os.path.join(data_in_direc, nc) (only recently learned that...you were probably just keeping consistent with the existing code. Do as I say, not as I do ;))

spencerahill · 2015-11-06T14:08:43Z

I have no problem with adding this functionality. But does it need to be its own method? Can the existing ..._one_dir method be modified to incorporate this functionality?

Better yet, can the isinstance... if/else blocks in the two functions be combined and made into their own function, that _one_dir calls? That would get rid of ~30 lines of repeated code.

spencerkclark · 2015-11-06T15:40:35Z

Sure thing -- I considered this, but wasn't sure if it would be worth it, since we are likely going to change this process down the road. Since it's quick and easy I'll look into doing this for now.

Perhaps I should move this discussion to an issue (and maybe you've thought about something along these lines already), but these are a bit of my thoughts going forward:

While currently within Calc the two methods accomplish the same tasks, in an abstract sense the current ..._one_dir and ..._gfdl read-in methods are actually quite different:

..._one_dir requires the user to map every variable to a file name explicitly. If the variable does not appear in this map, aospy will not even attempt to look for it.
..._gfdl is an implicit system. The mapping is coded into the method which looks for the files within the post-processing file structure. The user is not required a priori to specify which variables are in which files, and thus aospy is allowed to look for variables that may not exist.

I would argue that the explicit read-in method is the most general way of doing things. With enough information, one could automatically generate an explicit file map from an implicit generator. In addition, there is nothing that says you couldn't relax the current single directory constraint and just map each variable (within a particular time frequency) to a full file path.

To continue to support implicit read-in methods (for very structured output data, like ..._gfdl) you could require that a user create some object that implements an interface (call the interface FileMapGenerator?) to include methods to generate a map to files for a particular variable when given the intvl_in, variable name, data_in_dur etc. as arguments. The source code for these objects could be stored in a user's aospy_user directory.

Within a Run object one could then have a single argument for the file read-in method. The user could pass either the explicit dictionary mapping or they could pass an object that implements the FileMapGenerator interface. Within Calc, when reading in the files, you could have some simple logic that would be along the lines of: "if an explicit map is provided use the map; if not, use information about which variable you are looking for, and the interval in etc. and pass those as arguments to the generator, which would return an explicit map for just that variable." Using an interface would ensure that the explicit file map generated would always have the same structure (so that it could be used seamlessly within Calc).

spencerkclark · 2015-11-12T18:54:51Z

This should be ready to go now. You can now specify my dictionary example above under the one_dir read in mode method and things should work as expected.

Add 'one_dir_freq' file read-in option

spencerahill · 2015-11-12T22:45:47Z

Thanks, Spencer! Looks great. And bonus points for # lines deleted > # lines added

spencerkclark added 2 commits November 5, 2015 15:30

Added a new file read in mode one_dir_freq

7ba7dc6

Merge branch 'develop' of https://github.com/spencerahill/aospy into …

c221a39

…one_dir_freq

spencerahill reviewed Nov 6, 2015
View reviewed changes

spencerahill mentioned this pull request Nov 6, 2015

Abstractify methods for finding data files saved on disk #32

Closed

spencerkclark added 2 commits November 12, 2015 13:49

Subsumed one_dir_freq option into one_dir option.

b30bf8a

Fixed messy join call.

dfe10b9

spencerahill pushed a commit that referenced this pull request Nov 12, 2015

Merge pull request #31 from spencerkclark/one_dir_freq

a55ee82

Add 'one_dir_freq' file read-in option

spencerahill merged commit a55ee82 into spencerahill:develop Nov 12, 2015

spencerkclark deleted the one_dir_freq branch November 13, 2015 00:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 'one_dir_freq' file read-in option #31

Add 'one_dir_freq' file read-in option #31

spencerkclark commented Nov 5, 2015

spencerahill Nov 6, 2015

spencerahill commented Nov 6, 2015

spencerkclark commented Nov 6, 2015

spencerkclark commented Nov 12, 2015

spencerahill commented Nov 12, 2015

Add 'one_dir_freq' file read-in option #31

Add 'one_dir_freq' file read-in option #31

Conversation

spencerkclark commented Nov 5, 2015

spencerahill Nov 6, 2015

Choose a reason for hiding this comment

spencerahill commented Nov 6, 2015

spencerkclark commented Nov 6, 2015

spencerkclark commented Nov 12, 2015

spencerahill commented Nov 12, 2015