Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Split calling of METplus's Pb2nc tool into separate task (from PointStat and EnsembleStat tasks) #668

Merged
merged 72 commits into from
Mar 20, 2023

Conversation

gsketefian
Copy link
Collaborator

@gsketefian gsketefian commented Mar 13, 2023

DESCRIPTION OF CHANGES:

Split calls to METplus's Pb2nc tool out of the tasks that run

  1. deterministic verification (vx) using the PointStat tool,
  2. ensemble vx using the EnsembleStat, and
  3. verification on the ensemble mean and ensemble probabilities using the PointStat tool

into separate tasks. This new task is run for each cycle to convert NDAS prep-bufr files to NetCDF format. Also, include additional cleanup of vx tasks.

Details of introduction of Pb2nc task:

  • Introduce a new j-job (JREGIONAL_RUN_MET_PB2NC_OBS) and an ex-script (exregional_run_met_pb2nc_obs.sh) to run the new Pb2nc task.
  • Introduce a METplus configuration file (Pb2nc_obs.conf) for running the Pb2nc task.
  • Remove the capability to run Pb2nc as well as all METplus variables used by Pb2nc from those METplus configuration files for SFC and UPA that are used by the deterministic PointStat (PointStat_[SFC|UPA].conf), EnsembleStat (EnsembleStat_[SFC|UPA].conf), and ensemble mean and probabilistic PointStat (PointStat_ens[mean|prob]_[SFC|UPA].conf) tasks.
  • Include thePb2nc task in the ROCOTO template XML and add it to the dependencies of the deterministic PointStat and (ensemble) EnsembleStat tasks for SFC and UPA.
  • Add a Pb2nc task section to config_defaults.yaml. Also, in the verification section of this file, add template parameters for NDAS obs files needed in running the Pb2nc task.
  • For both deterministic and ensemble PointStat tasks, remove the variable FHR (from the ROCOTO XML, ex-scripts, and METplus configuration files). Instead, in the ex-scripts, define the alternate variable FHR_LIST that is generated by a call to set_vx_fhr_list.sh and use it in the METplus configuration files. This is necessary to take into account missing obs (i.e. FHR_LIST will not include forecast hours for which the obs file is missing). In a future PR, this will also be done for other vx tasks.
  • For clarity, in the ROCOTO XML vx tasks and vx ex-scripts, change variable names as follows:
    • Change ACCUM to ACCUM_HH to emphasize that it represents a 2-digit hour.
    • Change SLASH_ENSMEM_SUBDIR to SLASH_ENSMEM_SUBDIR_OR_NULL and use it for both deterministic and ensemble forecasts (where for deterministic forecasts, this gets set to an empty string).
  • In the ROCOTO XML, add the variable ACCUM_HH (and set it to a null value) because it is needed in the call to set_vx_fhr_list.sh (although it is not used). This makes it easier to later collapse the various PointStat tasks in the ROCOTO XML into a loop.

Other changes/cleanup not directly related to the new Pb2nc task:

  • Move the LOG_DIR variable from some of the METplus tool-specific configuration files to the shared (among all METplus tools) common.conf config file since LOG_DIR is defined in the same way in all the tool-specific files. In a future PR, LOG_DIR will be removed from any remaining tool-specific configuration files that still contain it.
  • Change vx task output directory names from
    pb2nc_obs, pcp_combine_[obs|fcst], grid_stat, point_stat, ensemble_stat
    etc, to
    Pb2nc_obs, PcpCombine_[obs|fcst], GridStat, PointStat, EnsembleStat.
  • For consistency with other vx tasks, in the EnsembleStat tasks don't include ensgrid_ in name of the "final" METplus configuration file (metplus_final...).
  • Change names of output directories generated by the ensemble mean and probabilistic GridStat and PointStat tasks so that instead of containing the (mis-named) string ensemble_stat_, they contain the string GridStat_ or PointStat_.
  • For clarity, change the strings _mean and _prob in METplus output file and directory names to _ensmean and _ensprob, respectively.
  • Remove unneeded PCP_COMBINE variable settings from (both deterministic and ensemble) GridStat METplus configuration files for APCP01h, REFC, and RETOP.
  • For vx tasks that run on the SFC set of fields, change the strings conus_surface and conus_sfc in vx output directory and file names to SFC.
  • For vx tasks that run on the UPA set of fields, change the string upper_air in vx output directory and file names to UPA.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

TESTS CONDUCTED:

  • hera.intel
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss2.intel
  • NOAA Cloud (indicate which platform)
  • Jenkins
  • fundamental test suite
  • comprehensive tests (specify which if a subset was used)

The following set of fundamental tests were conducted on Hera with the Intel compiler. All were successful.

MET_verification
MET_verification_only_vx
community_ensemble_2mems
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional_plot
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional_plot
MET_ensemble_verification
community_ensemble_2mems_stoch
pregen_grid_orog_sfc_climo

DOCUMENTATION:

Documentation is needed but will be added once the whole set of vx PRs has been merged (see Issue #630).

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

… for Hera and modify run_WE2E_tests.py so that this location is obtained in a platform-independent way.
…ation of staged forecast files; modify code to correctly set VX_FCST_INPUT_BASEDIR in WE2E experiments.
…o that it is exactly consistent with the one used in run_WE2E_tests.py.
…t each task has its own values for NNODES, PPN, MEM, WTIME, MAXTRIES.
…uired by NCO); also rename vx task top-level sections in config_defaults.yaml so that they're all lower case (to match names of ex-scripts).
…ks; Add ensemble member name (if running ensemble vx) to the names of the vx tasks in the ROCOTO xml.
…t convention; fix bugs in the two ex-scripts exregional_run_met_pointstat_ens[mean|prob].sh that call METplus twice.
…output. Details below.

In the ex-scripts for EnsembleStat: (1) generate a new variable FCST_INPUT_FN_TEMPLATE that contains a list of forecast input template files for each of the ensemble members and use this in the METplus configuration files, and (2) change the input base directory (INPUT_BASE) to use VX_FCST_INPUT_BASEDIR, which depending on whether forecasts are being run is set either to the location of the UPP output from the forecasts or to the staged forecast UPP output (this allows staged ensemble staged files to be used to perform ensemble vx).

In config_defaults.yaml, introduce new template variables (FCST_SUBDIR_TEMPLATE, FCST_FN_TEMPLATE, and FCST_FN_METPROC_TEMPLATE) that are/will be used in forming the variable FCST_INPUT_FN_TEMPLATE in the EnsembleStat ex-scripts.
…leStat for point-obs, ensure that METplus is called only once and for the correct variable (instead of for both SFC and UPA).
… This should later be replaced with a python version.
… This should later be replaced with a python version.
…able the ensemble member name (if doing an ensemble forecast). This makes it easier in METviewer to identify curves corresponding to different members.
…_LOCAL_MODULE_FN and VX_LOCAL_MODULE_FN; fix spacing.
Copy link
Collaborator

@danielabdi-noaa danielabdi-noaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make sure MET_ensemble_verification works with this PR in NCO mode if possible?
I was getting the following error on the run_MET_PcpCombine_fcst_APCP03h_mem002 task

+ exregional_run_met_pcpcombine.sh[230]: print_err_msg_exit 'The list of forecast hours for which to run METplus is empty:
  FHR_LIST = []'

<nodesize>&NCORES_PER_NODE;</nodesize>
<native>&SCHED_NATIVE_CMD;</native>
<jobname>{{tn}}</jobname>
<join><cyclestr>&LOGDIR;/{{tn}}_@Y@m@d@H.log</cyclestr></join>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make nco mode work

Suggested change
<join><cyclestr>&LOGDIR;/{{tn}}_@Y@m@d@H.log</cyclestr></join>
<join>&LOGDIR;/{{tn}}_<cyclestr>@Y@m@d@H</cyclestr>&LOGEXT;</join>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielabdi-noaa, thanks for pointing this out!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielabdi-noaa Thanks for pointing this out. I changed it, but there are other issues in NCO mode that prevent the tasks working in that mode (mostly paths need to be set correctly). I will address them in my next PR.

Copy link
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gsketefian These changes look good to me! I have left a few comments regarding commented out lines in some of the METplus configuration files and ex- scripts.

parm/metplus/PointStat_UPA.conf Outdated Show resolved Hide resolved
parm/metplus/PointStat_UPA.conf Outdated Show resolved Hide resolved
parm/metplus/PointStat_SFC.conf Outdated Show resolved Hide resolved
parm/metplus/PointStat_SFC.conf Outdated Show resolved Hide resolved
scripts/exregional_run_met_gridstat_vx.sh Show resolved Hide resolved
scripts/exregional_run_met_pointstat_vx.sh Show resolved Hide resolved
@gsketefian
Copy link
Collaborator Author

@MichaelLueken I removed commented out lines except in the two ex-script that I'd like to keep for reference until my next PR. Will remove them in that one.

I retested with vx WE2E tests and everything still works (but not NCO mode, for which I'll put i a fix in my next PR). So if you and @danielabdi-noaa are ok with this, it's ready for automated tests. Thanks.

Copy link
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gsketefian Thanks for addressing my concerns! Approving now.

@MichaelLueken MichaelLueken added the run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests label Mar 17, 2023
@MichaelLueken
Copy link
Collaborator

@gsketefian All Jenkins tests successfully passed. On Orion, the tests sat in queue for several hours (over fourteen hours), leading to a time-out for the Build and Test check. Moving forward with your PR now.

@MichaelLueken MichaelLueken merged commit f8b4eb4 into ufs-community:develop Mar 20, 2023
@gsketefian
Copy link
Collaborator Author

@MichaelLueken Thanks Mike. I will try to get my next PR opened today.

danrosen25 pushed a commit to esmf-org/ufs-srweather-app that referenced this pull request Mar 28, 2023
…fs-community#697)

## DOCUMENTATION:
This PR removes the `FV3_CPT_v0`, `FV3_GSD_v0`, and `FV3_GSD_SAR` suites from the workflow.  This consists of:
1. Removing these suites from ex-scripts, templates, and the set of valid values for the variable `CCPP_PHYS_SUITE`,
2. Removing the `diag_table_...` and `field_table_...` files for these suites.
3. Removing WE2E tests in the `grids_extrn_mdls_suites_community` category (which are tests to make sure that specific combinations of grids, external models, and suites work well together) that use these suites.
4. Modifying the three WE2E tests in the `wflow_features` category (`get_from_HPSS_ics_HRRR_lbcs_RAP`, `get_from_HPSS_ics_RAP_lbcs_RAP`, and `specify_DT_ATMOS_LAYOUT_XY_BLOCKSIZE`) that happen to use the `FV3_GSD_SAR` suite such that they now use the `FV3_HRRR` suite. (There are no such tests that use the `FV3_CPT_v0` and `FV3_GSD_v0` suites.)  Note that we don't remove these tests because their purpose is not to test the suite but to test fetching of files from HPSS (`get_from_HPSS_ics_HRRR_lbcs_RAP` and `get_from_HPSS_ics_RAP_lbcs_RAP`) and to test that the experiment variables `DT_ATMOS`, `LAYOUT_X`, `LAYOUT_Y`, and `BLOCKSIZE` can be correctly specified in the user's experiment configuration file (`specify_DT_ATMOS_LAYOUT_XY_BLOCKSIZE`
5. Updating comments in scripts that may refer to one of these three suites.

This PR also makes improvements to the `tests/get_expts_status.sh` script that is used to check the status of a set of experiments in a specified directory.

## DEPENDENCIES:
PR #[224](ufs-community#224) in the `ufs-srweather-app` repo.

## TESTS CONDUCTED:
Ran the following tests on Hera:
```
grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1alpha
grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
nco_grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_HRRR
get_from_HPSS_ics_HRRR_lbcs_RAP
get_from_HPSS_ics_RAP_lbcs_RAP
specify_DT_ATMOS_LAYOUT_XY_BLOCKSIZE
```
All succeeded.  Also, since the modifications to the `FV3.input.yml` file affect the `FV3_RRFS_v1alpha`, `FV3_RRFS_v1beta`, and `FV3_HRRR` suites, the `input.nml` files for these suites generated using the (original) `develop` branch were compared to the ones generated using this branch/PR, and all were found to be identical.

## ISSUE (optional): 
Resolves Issue ufs-community#668.
danrosen25 pushed a commit to esmf-org/ufs-srweather-app that referenced this pull request Mar 28, 2023
…nity#701)

## DESCRIPTION OF CHANGES: 
Several paths in the machine-specific files point to locations in user paths or old locations of static data. This PR updates paths of static data in regional_workflow/ush/machine/ to point to the official, centralized locations on Cheyenne, Hera, and Jet.

## TESTS CONDUCTED: 
Ran the following suite of end-to-end tests on Cheyenne and Jet prior to the latest ufs-weather-model hash update. All passed. This list of tests was chosen because all of these tests are known to succeed on all tested platforms, and this tests a variety of input and boundary condition types.

- grid_CONUS_25km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
- grid_RRFS_CONUS_13km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
- grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
- grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
- grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
- grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta
- grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_HRRR
- grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
- grid_RRFS_CONUS_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta


On Hera, I ran tests with the latest SRW hash, which included the updated weather model. Because of this, many tests could not be generated due to using old, removed CCPP suites (see issue ufs-community#668). To get around this issue, I tested with the fixes from ufs-community#697 incorporated into my branch. With those extra commits, all "get_extrn_ics" and "get_extrn_lbcs" tasks completed successfully, which indicates that all data is in its correct place.

## ISSUE (optional): 
Will resolve a few issues in ufs-community#673, many remain however.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants