Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5 bash scripts from Gary to create time series #29

Merged
merged 15 commits into from
Oct 30, 2020

Conversation

mnlevy1981
Copy link
Contributor

Also, a single python script that submits each of the five scripts to the slurm
queue on casper.

Note that I've modified Gary's original scripts to take case identifier (e.g.
003 or 004) and a single year as command line arguments. The python script sets
the default case to 004, but requires users to specify at least a single year.
The user can also specify specific scripts to run, and the default is to run
all five of them. There is also a "dry-run" option that doesn't actually call
sbatch.

Also, a single python script that submits each of the five scripts to the slurm
queue on casper.

Note that I've modified Gary's original scripts to take case identifier (e.g.
003 or 004) and a single year as command line arguments. The python script sets
the default case to 004, but requires users to specify at least a single year.
The user can also specify specific scripts to run, and the default is to run
all five of them. There is also a "dry-run" option that doesn't actually call
sbatch.
Since the bash scripts will be submitted to slurm by the python script, they do
not need to be executable.
Now pass --mail-type and --mail-user through the python script (default sends
email, but --no-mail turns off the messages)
I've added 0007 and 0008 to glade/campaign, so compare_ts_and_hist_004 checks
those years. Also, I cleaned up some of the output (no longer printing start /
finish time)
There is now a way to query whether a specific year of a variable from dataset
came from time series or history files. This is probably only useful for the
compare_ts_and_hist notebooks, which have  been re-run. Note that for this
commit I re-ran the notebooks on cheyenne, which does not have access to the
time series data on campaign -- when casper is back up, I will re-run the
notebooks to actually do the comparison.
Casper is online so we can compare to time series again
Pass short term archive root as an argument (default is
/glade/scratch/$USER/archive) to shell scripts rather than assuming archive is
in my scratch directory and pass full name of case rather than suffix. These
two changes combined should make the tool general enough to apply to any CESM
case (e.g. Kristen's 1-degree cocco runs).

Also cleaned up the way data_reshaping/logs is ignored; may need an additional
commit to create the directory from run_all.py as a result.
Created utils/compare_ts_and_hist.py which will eventually be a command line
tool but also provides compare_ts_and_hist() via import utils.
@mnlevy1981
Copy link
Contributor Author

6341c3b reduces the code duplication between the two notebooks comparing time series and history output, but there are a few things I don't like:

  1. It relies on a string comparison to check the output of each (stream, varname, year) combination
  2. There is still a lot of duplicate code parsing the output

One option for the second issue is to add a parse_ts_hist_comparison() to utils, though that feels a little kludgy. I think there's a possibility that addressing the first issue also addresses the second, but I'm not entirely clear on what that would look like.

1. CaseClass has two new public methods: get_timeseries_files() and
   get_history_files(); both return lists of files for a given year and stream.
   For time series, users can also specify a list of varnames to further pare
   down the resulting list of files.
2. gen_dataset() now relies on the two functions mentioned in (1) to determine
   what files to open
3. Massive overhaul to compare_ts_and_hist:
   * Use open_mfdataset and case.get_history_files() to open ds_hist for a
     given stream and year; then loop through variables and check that
     get_timeseries_files() does not return an empty list
   * No longer run da.identical(); for now, we are only concerned with
     verifying that all variables from history files made it into time series
   * This puts "reinstate da.identical()" on a to-do item; even with dask I was
     running into memory issues comparing monthly 3D fields
   * Refactored so there is utils/compare_ts_and_hist.py that will eventually
     be a command-line tool for comparing a given stream and year but is
     currently imported via utils. Also wrote
     utils.utils.timeseries_and_history_comparison() which is just a wrapper
     that accounts for things like missing cice.h1 time series from year 1. I
     think compare_ts_and_hist.py should live with CaseClass when we refactor
     this package, while timeseries_and_history_comparison() is specific to the
     high-res analysis
4. Add ability to get cice.h and cice.h1 streams for both history and time
   series so (3) compares all five streams rather than just looking at a few
   specific variables in pop.h
And a few other bad / unnecessary imports
Github Actions didn't like the "import utils" call even though it was fine in
the notebooks, I think because utils.utils was trying to import
compare_ts_and_hist.py; now that import is in the
timeseries_and_history_comparison() function and hopefully everything will work
again.
notebooks/utils/utils.py Outdated Show resolved Hide resolved
Moved the import statement out of timeseries_and_history_comparison() and fixed
sys.path in test_utils.py to ensure the import statement still works.
compare_ts_and_hist.py needs CaseClass.CaseClass, not just CaseClass
@mnlevy1981 mnlevy1981 merged commit bd2e846 into marbl-ecosys:master Oct 30, 2020
@mnlevy1981 mnlevy1981 deleted the reshaping_scripts branch November 20, 2020 05:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants