Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rrfs_ci: Merge in develop #295

Merged
merged 21 commits into from
Mar 25, 2022

Conversation

christinaholtNOAA
Copy link
Collaborator

@christinaholtNOAA christinaholtNOAA commented Mar 22, 2022

DESCRIPTION OF CHANGES:

Merge in ufs-community develop. Please do not squash this merge.

TESTS CONDUCTED:

None yet. Should be tested via automated on-prem framework

DEPENDENCIES:

PR #123 in NOAA-GSL/ufs-srweather-app.

hertneky and others added 18 commits February 19, 2022 16:43
… using the CRTM (ufs-community#682)

Co-authored-by: Tracy <tracy.hertneky@noaa.gov>
…`WCOSS_CRAY`; fix cron capability for `tcsh` users on Cheyenne (ufs-community#675)

* Remove unneeded sourcing of source_util_funcs.sh; add "%s" to printf calls since that's the proper calling method; edit comments.

* Generalize machine files.  Details:

* Add a wrapper (source_machine_file.sh) for sourcing the machine file that allows other commands common to all machines to be called.
* Change the scalar variable MODULE_INIT_PATH in the machine files to the array variable ENV_INIT_SCRIPTS_FPS that specifies the list of system scripts that need to be sourced (e.g. to make the "module" command available in a given script).  This is needed because on Cheyenne, at least two system scripts need to be sourced (to enable "module" and "qsub").
* Move the "ulimit" commands at the ends of the machine files into the new variable PRE_TASK_CMDS so that they are not called every time the machine file is sourced.  They will be called only if a given script issues an "eval ${PRE_TASK_CMDS}" (which all the ex-scripts will do).

* In the relevant ex-scripts: (1) Change sourcing of machine files to use the wrapper source_machine_file.sh; (2) Use "eval" to evaluate the contents of PRE_TASK_CMDS.

* In the WE2E script, change sourcing of the machine file to use the wrapper source_machine_file.sh.

* Add new variable valid_vals_BOOLEAN to constants.sh so that this file can be sourced and the valid values for a boolean can be made available to any other script.

* Bug fix.

* Remove file that was accidentally added in previous commit.

* Change the way crontab is called so that it also works on Cheyenne (for tcsh users).  Details:

* Introduce new function get_crontab_contents() that takes as input whether or not the calling script is itself being called from a cron job and returns (1) the path to the appropriate crontab command and (2) the contents of the user's cron table.
  * Such a function is needed because on Cheyenne, the location of the crontab command is different depending on whether or not the script that's calling crontab is itself called from a cron job (because on Cheyenne, "crontab" is containerized, and that complicates things).
* Use get_crontab_contents() in generate_FV3LAM_wflow.sh and launch_FV3LAM_wflow.sh (instead of simply calling "crontab" because the latter approach doesn't work on Cheyenne, at least not with users whose login shell is tcsh).
* Add "called_from_cron" as an optional argument to launch_FV3LAM_wflow.sh [so that it can then be passed on to get_crontab_contents()].  This argument is only used in the cron job that relaunches the workflow (which is created only if USE_CRON_TO_RELAUNCH is set to "TRUE").
  * Having an optional argument like this seems to be the best way to tell launch_FV3LAM_wflow.sh whether or not it is running from a cron job.
  * launch_FV3LAM_wflow.sh can still be called from the command line without any arguments (since the default value of "called_from_cron" is "FALSE").

* Generalize the way commands are initialized so that any number of system scripts can be sourced in a given script (currently, only "module" is initialized).  Details:

* Introduce the new function init_env() that initializes the envrionment of a script by sourcing necessary system scripts.  The full paths to these system scripts are specified in the array ENV_INIT_SCRIPTS_FPS in the machine files.
  * This function is needed because (1) this sourcing needs to be done in a couple of different scripts in the SRW App and (2) on some machines (e.g. Cheyenne), more than one system script may need to be sourced.
* Use the new init_env() function in launch_FV3LAM_wflow.sh and load_modules_run_task.sh.
  * In load_modules_run_task.sh, init_env() replaces sourcing of only the system script that defines the "module" command.  That is because on Cheyenne, in addition to the "module" command, the "qsub" command needs to be defined/initialized (by sourcing a second system script named pbs.sh).

* Replace calls to "crontab -l" by echoing of already obtained contents.  Fix comments and informational messages.

* For Cheyenne, don't need to source two separate system scripts.  Just sourcing "/etc/profile" is enough to make both the "module" and "qsub" commands (and probably all other system-supported commands) available in non-login scripts.

* Make script exit with an error message if rocoto commands fail.

* Fix the system script that needs to be sourced on Hera to get "module" (and other commands) to work.

* In init_env.sh, declare "local" variables and change the index of the for-loop so it's different than the variable i used (and unset) by the system script on Hera.

* Fix the system script that needs to be sourced on Orion to enable the "module" and other commands in a non-login bash shell.

* Fix the system script on Jet that needs to be sourced to enable the "module" and other commands in a non-login bash shell.

* Update comments.

* Bug fix:  Make sure the variable __crontab_cmd__ is defined for WCOSS_DELL_P3.

* Try changing the system script to source on WCOSS_DELL_P3 to "/etc/profile" (since it works on the other machines to enable the "module" and other commands).  This needs to be tested by someone who has access to WCOSS_DELL_P3.

* Changes to try to make the machine file work for WCOSS_DELL_P3.  Not yet tested.

* Fix modulepath issue on wcoss

* Fix issues on wcoss cray

* Fix crontab issue on wcoss cray

* Remove support for WCOSS_CRAY.

* Place double qoutes around ${RUN_CMD_...} in if-statements that check whether the RUN_CMD_... variable is empty, i.e. -z "${RUN_CMD_...}".  This is needed because on Cheyenne, not having the double quotes generates an error when RUN_CMD_... consists of a command that contains spaces (e.g. "mpirun -np ...").

Co-authored-by: chan-hoo <chan-hoo.jeon@noaa.gov>
Co-authored-by: Tracy <tracy.hertneky@noaa.gov>
…unity#690)

## DESCRIPTION OF CHANGES: 
When a user's `cron` table is empty, the `get_crontab_contents` function does not set the return variable properly due to a bug in the way `printf -v ...` is called towards the end of that function.  This PR fixes that bug as well as some others found after testing with an initially empty user cron table.

## TESTS CONDUCTED: 
On Hera, Jet, and Cheyenne, ran the WE2E test `deactivate_tasks`, starting with both an empty `cron` table and a non-empty one.  All tests passed with the `cron` table being modified properly at the start and end of the test.

## CONTRIBUTORS (optional): 
@chan-hoo pointed out this error on Hera.
* Create singularity.sh

* added machine/sing script for review

we need to merge the change and delete this file later.

* added singularity machine value

* fixed mpirun command name and options: singularity

* cleaned up singularity script

* Update valid_param_vals.sh

* Update singularity.sh

* Update singularity.sh

* Update singularity.sh

Removed empty string MET and test variables.

Co-authored-by: JONG KIM <jong.kim@noaa.gov>
## DESCRIPTION OF CHANGES: 
Cleaning up bugs in the machine files.  The first bug prompted this PR, and the rest were found subsequently.  The bugs (and their fixes) are as follows:

1) A space is missing after the `print_info_msg` and `print_err_msg_exit` function calls in the `file_location` functions.  Inserting a space gets passed this bug, but subsequent issues were found as described below.

**For machine files that call the `print_info_msg` function in `file_location` (`cheyenne.sh`, `hera.sh`, `jet.sh`, and `orion.sh`):**
Fixing this bug leads to other failures because when the "*" stanza is encountered in the `file_location` function, 
the `EXTRN_MDL_SYSBASEDIR_ICS|LBCS` variable gets set to the message that `file_location` returns.  Since that message contains spaces, it leads to other failures in downstream scripts (the ex-scripts).  Simply removing the printing out of the message (thus causing `EXTRN_MDL_SYSBASEDIR_ICS|LBCS` to be set to a null string) fixes the failures, so this was the fix implemented.  If desired, a message for an empty value for `EXTRN_MDL_SYSBASEDIR_ICS|LBCS` can be placed in another script (where those variables are used).

**For machine files that use `print_err_msg_exit` in `file_location` (`stampede.sh` and `wcoss_dell_p3.sh`):**
These should not exit if the file location is not available since the experiment can still complete successfully.  So just removing the `print_err_msg_exit` call should work (and make the behavior of these machine files consistent with the set above).

2) In all the machine files, the variable `FV3GFS_FILE_FMT_ICS` should be changed to `FV3GFS_FILE_FMT_LBCS` in the definition of `EXTRN_MDL_SYSBASEDIR_LBCS`.  This was fixed in all the files.

3) In `stampede.sh`, a variable named `SYSBASEDIR_ICS` is defined.  This is a typo.  Modify to `EXTRN_MDL_SYSBASEDIR_ICS`.

## TESTS CONDUCTED: 
Ran the WE2E test `grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_GSD_SAR` on:
* Hera -- successful
* Jet -- successful except for UPP tasks
* Cheyenne -- successful except for UPP tasks

The UPP task failures are new and being experienced by other PRs as well (e.g. ufs-community#689).  The original issue with machine files seems resolved.

## CONTRIBUTORS (optional): 
@JeffBeck-NOAA encountered and reported the original error.
* Tweaks for running with containers on azure

* added config.sh for GST on azure

* added AWS to load_modules_run_task.sh

* working on bare metal now

* Changing to azure, aws, and singularity

* updates for singularity

* tweaks for running using singularity exec

* tweaks for running using singularity exec

* Converting to a single noaacloud type

* slight changes to config.sh for aws

* update machine file

* added missing slash to namelist

* changes for intel

* more cleanup

* cleaned up commented lines
* Add default namelist with SPP entries.

* Changes necessary to run with SPP

* Typo fix in generate script.

* Changes to implement SPP.

* Add comment regarding use of SPP.

* Pass N_VAR_SPP to the var_defns.sh file.

* Add spp_wts_pbl to the FV3_GSD_SAR SDF diag_table file

* Remove the contents of the &nam_spperts stanza and n_var_spp from the namelist when not using SPP.

* Remove SPP namelist entries from the template input.nml file.  These values are now handled in the generate script if using SPP.

* Add MYNN SFC pattern variable to the diag_table file for the FV3_GSD_SAR SDF

* Add 'sfc' perturbation option to SPP

* Add iseed array to namelist generation

* Changes to add "rad" and "gwd" SPP perturbations.

* Add SPP and ad-hoc stochastic physics to SDFs.

* Add LSM SPP functionality to the SRW App.

* Add random number generation for LSM SPP iseeds in ensemble mode.

* Fix undeclared variable for LSM SPP.

* Add if statement for LSM SPP namelist entries and set fhcyc to 999 if LSM SPP is turned on.

* Modify how namelist settings are applied for LSM SPP.

* Fix implementation of fhcyc changes.

* Typo fix.

* Add do_gsl* namelist entries to YAML file for FV3_HRRR SDF.

* Remove diss_est from the diag table files since it's unavailable for now.

* Change LSM SPP perturbation seed to be the same as all other SPP.

* Changes to FV3_HRRR field_table

* Change to the FV3_HRRR field_table file

* Update name for SPP block in input.nml

* Change SPP_LSM_* to LSM_SPP_*

* Remove space.

* Update descriptions of LSM perturbations.

* Shut off PET file generation.

* Modifications for land and SPP perturbations (templates, namelists, default values, etc.)

* Remove diss_est from diag_table files and remove the FV3_GSD_SAR and FV3_GSD_v0 diag_table files.

* Fix in setup.sh for LSM SPP

* Update Thompson MP SPP settings.

* Add back FV3_GSD_SAR and FV3_GSD_v0 SDFs.  Will be removed in future, separate PR.

* Requested modifications to the input.nml template and in-line documentation changes.

* Fix check for LSM SPP namelist settings

* Fix to LSM SPP namelist check.

* Changes request based on PR review.

* Changes requested in PR review.

* Variable descriptions.

* Fix comment formatting.

* Fix MET/METplus/obs paths in machine files to allow for user-defined settings in config.sh

* Only add/modify stochastic physics namelist entries when running with SPP, LSM SPP, SPPT, SHUM, or SKEB.
…w generation layer (ufs-community#674)

* Add python utility functions mirroring bash utils.

* Add str_to_list utility, export varibales to environment
Import env variables as numeric values to ease calculations.
Address some of Christina's points.

* Use dedent for multiline string formatting.

* Bug fix with export_vars.

* Bug fix with set_env_var

* Add !join  tag to yaml.

* Bug fix with date conversion.

* Beautify print_input_args.

* Import stuff to __init__.py

* Write dates in short form when HHMM = 0

* Clarify type conversions.

* Add option to parse old style shell user config

* Bug fix with handling same line comments

* Minor bug fixes

* Source complex config shell scripts instead of trying to parse.

* Modify config shell script loading routine.

* Update comments.

* Addressing some of Christina's comments.

* Removing set_bash/file_param.

* Minor changes in unittests.

* Remove unused config script parser.

* Avoid passing os.environ as default.

* Fix name of macos utils.

* Add typical regional workflow resoultions to the test.

* Add a copy of Externals.cfg to test.

* More changes to address @gsketefian's suggestions.

Co-authored-by: Daniel Shawul <dshawul@yahoo.com>
* clean up wcoss_cray and unnecessary modules on wcoss

* Fix a new line issue on wcoss_dell_p3

* Remove docs from regional_workflow
…fs-community#697)

## DOCUMENTATION:
This PR removes the `FV3_CPT_v0`, `FV3_GSD_v0`, and `FV3_GSD_SAR` suites from the workflow.  This consists of:
1. Removing these suites from ex-scripts, templates, and the set of valid values for the variable `CCPP_PHYS_SUITE`,
2. Removing the `diag_table_...` and `field_table_...` files for these suites.
3. Removing WE2E tests in the `grids_extrn_mdls_suites_community` category (which are tests to make sure that specific combinations of grids, external models, and suites work well together) that use these suites.
4. Modifying the three WE2E tests in the `wflow_features` category (`get_from_HPSS_ics_HRRR_lbcs_RAP`, `get_from_HPSS_ics_RAP_lbcs_RAP`, and `specify_DT_ATMOS_LAYOUT_XY_BLOCKSIZE`) that happen to use the `FV3_GSD_SAR` suite such that they now use the `FV3_HRRR` suite. (There are no such tests that use the `FV3_CPT_v0` and `FV3_GSD_v0` suites.)  Note that we don't remove these tests because their purpose is not to test the suite but to test fetching of files from HPSS (`get_from_HPSS_ics_HRRR_lbcs_RAP` and `get_from_HPSS_ics_RAP_lbcs_RAP`) and to test that the experiment variables `DT_ATMOS`, `LAYOUT_X`, `LAYOUT_Y`, and `BLOCKSIZE` can be correctly specified in the user's experiment configuration file (`specify_DT_ATMOS_LAYOUT_XY_BLOCKSIZE`
5. Updating comments in scripts that may refer to one of these three suites.

This PR also makes improvements to the `tests/get_expts_status.sh` script that is used to check the status of a set of experiments in a specified directory.

## DEPENDENCIES:
PR #[224](ufs-community/ufs-srweather-app#224) in the `ufs-srweather-app` repo.

## TESTS CONDUCTED:
Ran the following tests on Hera:
```
grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1alpha
grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
nco_grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_HRRR
get_from_HPSS_ics_HRRR_lbcs_RAP
get_from_HPSS_ics_RAP_lbcs_RAP
specify_DT_ATMOS_LAYOUT_XY_BLOCKSIZE
```
All succeeded.  Also, since the modifications to the `FV3.input.yml` file affect the `FV3_RRFS_v1alpha`, `FV3_RRFS_v1beta`, and `FV3_HRRR` suites, the `input.nml` files for these suites generated using the (original) `develop` branch were compared to the ones generated using this branch/PR, and all were found to be identical.

## ISSUE (optional): 
Resolves Issue ufs-community#668.
…nity#701)

## DESCRIPTION OF CHANGES: 
Several paths in the machine-specific files point to locations in user paths or old locations of static data. This PR updates paths of static data in regional_workflow/ush/machine/ to point to the official, centralized locations on Cheyenne, Hera, and Jet.

## TESTS CONDUCTED: 
Ran the following suite of end-to-end tests on Cheyenne and Jet prior to the latest ufs-weather-model hash update. All passed. This list of tests was chosen because all of these tests are known to succeed on all tested platforms, and this tests a variety of input and boundary condition types.

- grid_CONUS_25km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
- grid_RRFS_CONUS_13km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
- grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
- grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
- grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
- grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta
- grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_HRRR
- grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
- grid_RRFS_CONUS_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta


On Hera, I ran tests with the latest SRW hash, which included the updated weather model. Because of this, many tests could not be generated due to using old, removed CCPP suites (see issue ufs-community#668). To get around this issue, I tested with the fixes from ufs-community#697 incorporated into my branch. With those extra commits, all "get_extrn_ics" and "get_extrn_lbcs" tasks completed successfully, which indicates that all data is in its correct place.

## ISSUE (optional): 
Will resolve a few issues in ufs-community#673, many remain however.
…re/ens_design_RRFS) (ufs-community#695)

* Create feature/ens_design_RRFS branch and move original MET and METplua files to hold dirs.

* Updated METplus conf files to remove dependency on project-based MET configs.

* Updates to METplus conf files and run scripts for transitioning away from user-provided MET config files.

* Removed MET configuration files; changed envar ACCUM back to acc in several GridStat conf files.

* Removed old METplus conf files and removed OBS PROB info from Grid-Stat METplus conf files.
@venitahagerty venitahagerty added ci-hera-intel-WE Kicks off automated workflow test on hera with intel ci-jet-intel-WE Kicks off automated workflow test on jet with intel and removed ci-hera-intel-WE Kicks off automated workflow test on hera with intel ci-jet-intel-WE Kicks off automated workflow test on jet with intel labels Mar 22, 2022
@venitahagerty
Copy link

Machine: hera
Compiler: intel
Job: WE
Repo location: /scratch2/BMC/zrtrr/rrfs_ci/autoci/pr/886242821/20220322183516/ufs-srweather-app
Build was Successful
If test failed, please make changes and add the following label back:
ci-hera-intel-WE

@venitahagerty
Copy link

Machine: jet
Compiler: intel
Job: WE
Repo location: /lfs1/BMC/nrtrr/rrfs_ci/autoci/pr/886242821/20220322183519/ufs-srweather-app
Build was Successful
If test failed, please make changes and add the following label back:
ci-jet-intel-WE

@venitahagerty venitahagerty added ci-hera-intel-WE Kicks off automated workflow test on hera with intel ci-jet-intel-WE Kicks off automated workflow test on jet with intel and removed ci-jet-intel-WE Kicks off automated workflow test on jet with intel ci-hera-intel-WE Kicks off automated workflow test on hera with intel labels Mar 22, 2022
@venitahagerty
Copy link

venitahagerty commented Mar 22, 2022

Machine: hera
Compiler: intel
Job: WE
Repo location: /scratch2/BMC/zrtrr/rrfs_ci/autoci/pr/886242821/20220322205021/ufs-srweather-app
Build was Successful
Rocoto jobs started
Long term tracking will be done on 12 experiments
If test failed, please make changes and add the following label back:
ci-hera-intel-WE
Experiment Failed on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
2022-03-22 21:18:06 +0000 :: hfe10 :: Task run_fcst, jobid=29802278, in state DEAD (FAILED), ran for 24.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_HRRR
2022-03-22 21:20:16 +0000 :: hfe10 :: Task run_fcst, jobid=29802302, in state DEAD (FAILED), ran for 31.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
2022-03-22 21:18:10 +0000 :: hfe03 :: Task run_fcst, jobid=29802285, in state DEAD (FAILED), ran for 30.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
2022-03-22 21:20:08 +0000 :: hfe08 :: Task run_fcst, jobid=29802304, in state DEAD (FAILED), ran for 31.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
2022-03-22 21:18:06 +0000 :: hfe12 :: Task run_fcst, jobid=29802277, in state DEAD (FAILED), ran for 29.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1alpha
2022-03-22 21:22:06 +0000 :: hfe11 :: Task run_fcst, jobid=29802327, in state DEAD (FAILED), ran for 28.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: nco_grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_HRRR
2022-03-22 21:20:06 +0000 :: hfe08 :: Task run_fcst, jobid=29802301, in state DEAD (FAILED), ran for 34.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
2022-03-22 21:22:08 +0000 :: hfe12 :: Task run_fcst, jobid=29802326, in state DEAD (FAILED), ran for 26.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
2022-03-22 21:24:11 +0000 :: hfe01 :: Task run_fcst, jobid=29802345, in state DEAD (FAILED), ran for 35.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: nco_grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_GFS_v15p2
2022-03-22 21:18:08 +0000 :: hfe12 :: Task run_fcst, jobid=29802284, in state DEAD (FAILED), ran for 23.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta
2022-03-22 21:24:08 +0000 :: hfe02 :: Task run_fcst, jobid=29802346, in state DEAD (FAILED), ran for 27.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
2022-03-22 21:18:06 +0000 :: hfe04 :: Task run_fcst, jobid=29802292, in state DEAD (FAILED), ran for 53.0 seconds, exit status=256, try=1 (of 1)
All experiments completed

@venitahagerty
Copy link

venitahagerty commented Mar 22, 2022

Machine: jet
Compiler: intel
Job: WE
Repo location: /lfs1/BMC/nrtrr/rrfs_ci/autoci/pr/886242821/20220322205014/ufs-srweather-app
Build was Successful
Rocoto jobs started
Experiment failed: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
2022-03-22 21:28:10 +0000 :: fe6 :: Task get_extrn_ics, jobid=1196593, in state DEAD (FAILED), ran for 8.0 seconds, exit status=256, try=1 (of 1)
Long term tracking will be done on 12 experiments
If test failed, please make changes and add the following label back:
ci-jet-intel-WE
Experiment Failed on jet: grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
2022-03-22 21:36:12 +0000 :: fe5 :: Task run_fcst, jobid=1196745, in state DEAD (FAILED), ran for 29.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on jet: nco_grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_GFS_v15p2
2022-03-22 21:38:07 +0000 :: fe7 :: Task run_fcst, jobid=1196827, in state DEAD (FAILED), ran for 34.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on jet: nco_grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_HRRR
2022-03-22 21:38:10 +0000 :: fe4 :: Task run_fcst, jobid=1196828, in state DEAD (FAILED), ran for 35.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on jet: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_HRRR
2022-03-22 21:40:10 +0000 :: fe2 :: Task run_fcst, jobid=1196866, in state DEAD (FAILED), ran for 32.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
2022-03-22 21:36:09 +0000 :: fe4 :: Task run_fcst, jobid=1196756, in state DEAD (FAILED), ran for 26.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on jet: nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
2022-03-22 21:36:06 +0000 :: fe5 :: Task run_fcst, jobid=1196746, in state DEAD (FAILED), ran for 29.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
2022-03-22 21:36:16 +0000 :: fe1 :: Task run_fcst, jobid=1196753, in state DEAD (FAILED), ran for 30.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on jet: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1alpha
2022-03-22 21:44:14 +0000 :: fe1 :: Task run_fcst, jobid=1196941, in state DEAD (FAILED), ran for 142.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on jet: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
2022-03-22 21:44:11 +0000 :: fe5 :: Task run_fcst, jobid=1196906, in state DEAD (FAILED), ran for 143.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on jet: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta
2022-03-22 21:48:13 +0000 :: fe1 :: Task run_fcst, jobid=1197139, in state DEAD (FAILED), ran for 32.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on jet: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
2022-03-22 21:44:08 +0000 :: fe1 :: Task run_fcst, jobid=1196934, in state DEAD (FAILED), ran for 110.0 seconds, exit status=256, try=1 (of 1)
All experiments completed

@venitahagerty venitahagerty added ci-jet-intel-WE Kicks off automated workflow test on jet with intel and removed ci-jet-intel-WE Kicks off automated workflow test on jet with intel labels Mar 24, 2022
@venitahagerty
Copy link

venitahagerty commented Mar 24, 2022

Machine: jet
Compiler: intel
Job: WE
Repo location: /lfs1/BMC/nrtrr/rrfs_ci/autoci/pr/886242821/20220324170510/ufs-srweather-app
Build was Successful
Rocoto jobs started
Experiment failed: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
2022-03-24 17:44:12 +0000 :: fe3 :: Task get_extrn_ics, jobid=1277278, in state DEAD (FAILED), ran for 10.0 seconds, exit status=256, try=1 (of 1)
Long term tracking will be done on 12 experiments
If test failed, please make changes and add the following label back:
ci-jet-intel-WE
Experiment Failed on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
2022-03-24 17:54:10 +0000 :: fe3 :: Task run_fcst, jobid=1277492, in state DEAD (FAILED), ran for 29.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on jet: nco_grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_HRRR
2022-03-24 17:54:10 +0000 :: fe1 :: Task run_fcst, jobid=1277512, in state DEAD (FAILED), ran for 33.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on jet: grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
2022-03-24 17:54:09 +0000 :: fe5 :: Task run_fcst, jobid=1277491, in state DEAD (FAILED), ran for 30.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on jet: nco_grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_GFS_v15p2
2022-03-24 17:52:17 +0000 :: fe7 :: Task run_fcst, jobid=1277483, in state DEAD (FAILED), ran for 26.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on jet: nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
2022-03-24 17:52:10 +0000 :: fe5 :: Task run_fcst, jobid=1277459, in state DEAD (FAILED), ran for 28.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
2022-03-24 17:52:13 +0000 :: fe1 :: Task run_fcst, jobid=1277473, in state DEAD (FAILED), ran for 27.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on jet: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_HRRR
2022-03-24 17:56:12 +0000 :: fe2 :: Task run_fcst, jobid=1277521, in state DEAD (FAILED), ran for 42.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on jet: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
2022-03-24 18:26:06 +0000 :: fe3 :: Task run_fcst, jobid=1277537, in state DEAD (FAILED), ran for 33.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on jet: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta
2022-03-24 18:26:06 +0000 :: fe8 :: Task run_fcst, jobid=1277563, in state DEAD (FAILED), ran for 30.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on jet: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
2022-03-24 18:26:06 +0000 :: fe2 :: Task run_fcst, jobid=1277526, in state DEAD (FAILED), ran for 33.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on jet: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1alpha
2022-03-24 18:26:14 +0000 :: fe2 :: Task run_fcst, jobid=1277536, in state DEAD (FAILED), ran for 33.0 seconds, exit status=256, try=1 (of 1)
All experiments completed

@venitahagerty venitahagerty added ci-jet-intel-WE Kicks off automated workflow test on jet with intel and removed ci-jet-intel-WE Kicks off automated workflow test on jet with intel labels Mar 24, 2022
@venitahagerty
Copy link

venitahagerty commented Mar 24, 2022

Machine: jet
Compiler: intel
Job: WE
Repo location: /lfs1/BMC/nrtrr/rrfs_ci/autoci/pr/886242821/20220324213511/ufs-srweather-app
Build was Successful
Rocoto jobs started
Experiment failed: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
2022-03-24 22:14:09 +0000 :: fe8 :: Task get_extrn_ics, jobid=1286188, in state DEAD (FAILED), ran for 7.0 seconds, exit status=256, try=1 (of 1)
Long term tracking will be done on 12 experiments
If test failed, please make changes and add the following label back:
ci-jet-intel-WE
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
Experiment Succeeded on jet: nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
Experiment Succeeded on jet: nco_grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_GFS_v15p2
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_HRRR
Experiment Succeeded on jet: nco_grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_HRRR
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1alpha
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta
All experiments completed

@christinaholtNOAA christinaholtNOAA added the ci-hera-intel-WE Kicks off automated workflow test on hera with intel label Mar 25, 2022
@venitahagerty venitahagerty removed the ci-hera-intel-WE Kicks off automated workflow test on hera with intel label Mar 25, 2022
@venitahagerty
Copy link

venitahagerty commented Mar 25, 2022

Machine: hera
Compiler: intel
Job: WE
Repo location: /scratch2/BMC/zrtrr/rrfs_ci/autoci/pr/886242821/20220325152010/ufs-srweather-app
Build was Successful
Rocoto jobs started
Long term tracking will be done on 12 experiments
If test failed, please make changes and add the following label back:
ci-hera-intel-WE
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_HRRR
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
Experiment Succeeded on hera: nco_grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_HRRR
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
Experiment Succeeded on hera: nco_grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_GFS_v15p2
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
Experiment Succeeded on hera: nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1alpha
All experiments completed

@venitahagerty
Copy link

Passes all the tests but one, and that seems to be a data path problem

@christinaholtNOAA christinaholtNOAA merged commit 6fd3304 into NOAA-GSL:rrfs_ci Mar 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.