5 bash scripts from Gary to create time series #29

mnlevy1981 · 2020-09-23T23:36:04Z

Also, a single python script that submits each of the five scripts to the slurm
queue on casper.

Note that I've modified Gary's original scripts to take case identifier (e.g.
003 or 004) and a single year as command line arguments. The python script sets
the default case to 004, but requires users to specify at least a single year.
The user can also specify specific scripts to run, and the default is to run
all five of them. There is also a "dry-run" option that doesn't actually call
sbatch.

Also, a single python script that submits each of the five scripts to the slurm queue on casper. Note that I've modified Gary's original scripts to take case identifier (e.g. 003 or 004) and a single year as command line arguments. The python script sets the default case to 004, but requires users to specify at least a single year. The user can also specify specific scripts to run, and the default is to run all five of them. There is also a "dry-run" option that doesn't actually call sbatch.

Since the bash scripts will be submitted to slurm by the python script, they do not need to be executable.

Now pass --mail-type and --mail-user through the python script (default sends email, but --no-mail turns off the messages)

I've added 0007 and 0008 to glade/campaign, so compare_ts_and_hist_004 checks those years. Also, I cleaned up some of the output (no longer printing start / finish time)

There is now a way to query whether a specific year of a variable from dataset came from time series or history files. This is probably only useful for the compare_ts_and_hist notebooks, which have been re-run. Note that for this commit I re-ran the notebooks on cheyenne, which does not have access to the time series data on campaign -- when casper is back up, I will re-run the notebooks to actually do the comparison.

Casper is online so we can compare to time series again

Pass short term archive root as an argument (default is /glade/scratch/$USER/archive) to shell scripts rather than assuming archive is in my scratch directory and pass full name of case rather than suffix. These two changes combined should make the tool general enough to apply to any CESM case (e.g. Kristen's 1-degree cocco runs). Also cleaned up the way data_reshaping/logs is ignored; may need an additional commit to create the directory from run_all.py as a result.

Created utils/compare_ts_and_hist.py which will eventually be a command line tool but also provides compare_ts_and_hist() via import utils.

mnlevy1981 · 2020-10-20T19:48:39Z

6341c3b reduces the code duplication between the two notebooks comparing time series and history output, but there are a few things I don't like:

It relies on a string comparison to check the output of each (stream, varname, year) combination
There is still a lot of duplicate code parsing the output

One option for the second issue is to add a parse_ts_hist_comparison() to utils, though that feels a little kludgy. I think there's a possibility that addressing the first issue also addresses the second, but I'm not entirely clear on what that would look like.

1. CaseClass has two new public methods: get_timeseries_files() and get_history_files(); both return lists of files for a given year and stream. For time series, users can also specify a list of varnames to further pare down the resulting list of files. 2. gen_dataset() now relies on the two functions mentioned in (1) to determine what files to open 3. Massive overhaul to compare_ts_and_hist: * Use open_mfdataset and case.get_history_files() to open ds_hist for a given stream and year; then loop through variables and check that get_timeseries_files() does not return an empty list * No longer run da.identical(); for now, we are only concerned with verifying that all variables from history files made it into time series * This puts "reinstate da.identical()" on a to-do item; even with dask I was running into memory issues comparing monthly 3D fields * Refactored so there is utils/compare_ts_and_hist.py that will eventually be a command-line tool for comparing a given stream and year but is currently imported via utils. Also wrote utils.utils.timeseries_and_history_comparison() which is just a wrapper that accounts for things like missing cice.h1 time series from year 1. I think compare_ts_and_hist.py should live with CaseClass when we refactor this package, while timeseries_and_history_comparison() is specific to the high-res analysis 4. Add ability to get cice.h and cice.h1 streams for both history and time series so (3) compares all five streams rather than just looking at a few specific variables in pop.h

And a few other bad / unnecessary imports

Github Actions didn't like the "import utils" call even though it was fine in the notebooks, I think because utils.utils was trying to import compare_ts_and_hist.py; now that import is in the timeseries_and_history_comparison() function and hopefully everything will work again.

notebooks/utils/utils.py

Moved the import statement out of timeseries_and_history_comparison() and fixed sys.path in test_utils.py to ensure the import statement still works.

compare_ts_and_hist.py needs CaseClass.CaseClass, not just CaseClass

mnlevy1981 added 10 commits September 23, 2020 17:30

shell scripts don't need to be executable

054efff

Since the bash scripts will be submitted to slurm by the python script, they do not need to be executable.

Remove un-necessary comment

9157beb

Small refactor to get my email out of scripts

f0393f4

Now pass --mail-type and --mail-user through the python script (default sends email, but --no-mail turns off the messages)

Merge branch 'master' into reshaping_scripts

a5be425

Update comparison notebooks

170bc3f

I've added 0007 and 0008 to glade/campaign, so compare_ts_and_hist_004 checks those years. Also, I cleaned up some of the output (no longer printing start / finish time)

Rerun compare_ts_and_hist notebooks

cebd853

Casper is online so we can compare to time series again

Refactor compare_ts_and_hist notebooks

6341c3b

Created utils/compare_ts_and_hist.py which will eventually be a command line tool but also provides compare_ts_and_hist() via import utils.

mnlevy1981 mentioned this pull request Oct 22, 2020

Expand checks in compare_ts_and_hist notebooks #33

Open

mnlevy1981 added 2 commits October 22, 2020 12:39

Clean up circle dependencies

e7ec903

And a few other bad / unnecessary imports

mnlevy1981 commented Oct 22, 2020

View reviewed changes

notebooks/utils/utils.py Outdated Show resolved Hide resolved

mnlevy1981 added 2 commits October 30, 2020 10:02

Update tests to handle import better

28176f8

Moved the import statement out of timeseries_and_history_comparison() and fixed sys.path in test_utils.py to ensure the import statement still works.

Bugfix in how CaseClass is used

b7ec1a0

compare_ts_and_hist.py needs CaseClass.CaseClass, not just CaseClass

mnlevy1981 merged commit bd2e846 into marbl-ecosys:master Oct 30, 2020

mnlevy1981 deleted the reshaping_scripts branch November 20, 2020 05:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

5 bash scripts from Gary to create time series #29

5 bash scripts from Gary to create time series #29

mnlevy1981 commented Sep 23, 2020

mnlevy1981 commented Oct 20, 2020

5 bash scripts from Gary to create time series #29

5 bash scripts from Gary to create time series #29

Conversation

mnlevy1981 commented Sep 23, 2020

mnlevy1981 commented Oct 20, 2020