Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MACOS and Generic Linux options for regional_workflow #402

Conversation

mkavulich
Copy link
Collaborator

DESCRIPTION OF CHANGES:

This PR adds generic platforms to the regional_workflow, not specific to any one machine, that should allow users to run the ufs-srweather-app on any UNIX-based machine, so long as the NCEPLIBS and other prerequisites have been properly installed without a workflow manager. This can be done using the scripts described in regional_workflow/ush/wrappers/README.md; additional documentation is currently being written.

Users can utilize these options by setting the MACHINE variable in config.sh to either "LINUX" or "MACOS". The LINUX option should allow most users to run the ufs-srweather-app on a generic Linux OS machine. The MACOS option is for MacOS/Darwin operating systems; this needs to be kept separate because the MacOS version of bash is very old, and missing some functionality, as well as several GNU Linux utilities having different functionality and/or names. This capability is not yet well-tested, as there seem to be issues getting some of the UFS_UTILS running on MacOS (segmentation faults).

TESTS CONDUCTED:

"Generic Linux" test was run on Cheyenne machine as a fresh install, including stand-alone install of NCEPLIBS, with no reference to staged or pre-built input files. This was run without rocoto or directly submitting jobs via PBS, but rather the entire workflow was run interactively on a compute node (using the qinteractive command which emulated the running of the workflow on a machine with no job scheduler.

On MacOS (Catalina, 10.15.7), was able to successfully generate workflow, and run most of the tasks successfully. The aforementioned issues prevented a full end-to-end test from succeeding, but tests conducted using input files copied from other platforms were successful, so theoretically once the UFS_UTILS issues are solved this should be fully functional.

These tests were all successful based on an older version of regional_workflow (6833b25). However, when I rebased those changes onto the current head, I was met with failures, which is due to namelist changes in the interim. Investigation of these failures is ongoing but I do not believe it should hold up this PR.

ISSUE (optional):

Resolves #369

…iables $SED and $READLINK that can be defined appropriately for MacOS and non-MacOS platforms, cutting down on the number of if-blocks needed. This commit takes care of files that have problematic `sed` commands, the next commit will replace the previously-added if-blocks for `readlink` as well.
…end of the generation script for cases where no log file is created.
…orm where there is no default value, we allow the user to specify the locations of ICs and LBCs to avoid an error, rather than forcing us to set a blank or dummy default value to get to the logic of user-defined directories.
… pre-configured platforms), add MACOS as a valid machine
…ut probably others too), add a MACOS stanza to make_grid script
…emove that line from make_grid script, and add appropriate MACOS stanza for make_orog script
…oes not fail on other platforms so long as USE_USER_STAGED_EXTRN_FILES = "TRUE". This error only seems to occur on certain flavors of bash.
…rily restrictive MACHINE check from exregional_make_orog.sh
@@ -8,7 +8,11 @@ function source_util_funcs() {
#
#-----------------------------------------------------------------------
#
local scrfunc_fp=$( readlink -f "${BASH_SOURCE[0]}" )
if [[ $(uname -s) == Darwin ]]; then
Copy link
Collaborator

@gsketefian gsketefian Jan 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try using a function maybe? Not sure if it will work, since BASH_SOURCE[0] may then change within the function...
Oh wait, you can first get BASH_SOURCE[0] here, then pass it to a generic readlink function. I think that'll work.

@mkavulich
Copy link
Collaborator Author

@gsketefian, @JeffBeck-NOAA, and @christinaholtNOAA, I believe I have addressed all comments. Just to clarify, because this was mentioned in a few places and I only addressed it once: there are a few instances where, because of the order that different scripts are sourced with the appropriate functions, it was impossible to remove all if statements regarding the utility name differences between MacOS and Linux, but I was able to consolidate the vast majority using the ush/bash_utils/define_macos_utilities.sh function.

I'm running a final round of tests tests after these last set of changes, but after that I believe this PR will be ready for approval and merge, barring any final comments or concerns.

@gsketefian
Copy link
Collaborator

@mkavulich Once this is working on MACOS, do you plan on retesting on Hera and/or Cheyenne to make sure nothing broke on those platforms?

@mkavulich
Copy link
Collaborator Author

@gsketefian Sorry about the problem in ush/launch_FV3LAM_wflow.sh, I pushed the fix just now. The script has not been tested on MacOS because our current method of running the workflow on MacOS is only via the stand-alone scripts in ush/wrappers. But I still thought it best to update this script for future-proofing.

I did test it on both Cheyenne and Hera to make sure no existing capabilities had been broken, and it did indeed work which I initially found confusing after seeing this obvious mistake in my changes. It seems like it still works because $exptdir has already been set to ".", and the entire launch script is run locally, so it never needs the full exptdir path.

Regardless, it has now been fixed. I also had to resolve some merge conflicts related to the Gaea changes, so I will run another round of tests on Hera, Cheyenne, and MacOS today before merging.

@mkavulich mkavulich added Tested on Hera Tested successfully on Hera machine Tested on Jet Successfully tested on Jet machine and removed tested on hera labels Feb 1, 2021
@mkavulich
Copy link
Collaborator Author

Some final notes for posterity:

@mkavulich mkavulich merged commit bc08607 into ufs-community:release/public-v1 Feb 1, 2021
christinaholtNOAA pushed a commit to christinaholtNOAA/regional_workflow that referenced this pull request Feb 10, 2021
…ity#402)

This PR adds generic platforms to the regional_workflow, not specific to any one machine, that should allow users to run the ufs-srweather-app on any UNIX-based machine, without a workflow manager, so long as the NCEPLIBS and other prerequisites have been properly installed. This can be done using the scripts described in regional_workflow/ush/wrappers/README.md; additional documentation is currently being written.

Users can utilize these options by setting the MACHINE variable in config.sh to either "LINUX" or "MACOS". The LINUX option should allow most users to run the ufs-srweather-app on a generic Linux OS machine. The MACOS option is for MacOS/Darwin operating systems; this needs to be kept separate because the MacOS version of bash is very old, and missing some functionality, as well as several GNU Linux utilities having different functionality and/or names.

"Generic Linux" test was run on Cheyenne machine (GNU 9.1.0 compilers) as a fresh install, including stand-alone install of NCEPLIBS, with no reference to staged or pre-built input files. This was run without rocoto or directly submitting jobs via PBS, but rather the entire workflow was run interactively on a compute node (using the `qinteractive` command which emulated the running of the workflow on a machine with no job scheduler).

On MacOS (Catalina, 10.15.7), with GNU 10.1.0 compilers, was able to successfully generate workflow, and run end-to-end successfully. Currently there is a bug in UFS_UTILS that makes the make_orog test fail; UFS UTILS PR245 must be merged to fix this.

Resolves ufs-community#369
christinaholtNOAA referenced this pull request in NOAA-GSL/regional_workflow Feb 17, 2021
* Add MACOS and Generic Linux options for regional_workflow (#402)

This PR adds generic platforms to the regional_workflow, not specific to any one machine, that should allow users to run the ufs-srweather-app on any UNIX-based machine, without a workflow manager, so long as the NCEPLIBS and other prerequisites have been properly installed. This can be done using the scripts described in regional_workflow/ush/wrappers/README.md; additional documentation is currently being written.

Users can utilize these options by setting the MACHINE variable in config.sh to either "LINUX" or "MACOS". The LINUX option should allow most users to run the ufs-srweather-app on a generic Linux OS machine. The MACOS option is for MacOS/Darwin operating systems; this needs to be kept separate because the MacOS version of bash is very old, and missing some functionality, as well as several GNU Linux utilities having different functionality and/or names.

"Generic Linux" test was run on Cheyenne machine (GNU 9.1.0 compilers) as a fresh install, including stand-alone install of NCEPLIBS, with no reference to staged or pre-built input files. This was run without rocoto or directly submitting jobs via PBS, but rather the entire workflow was run interactively on a compute node (using the `qinteractive` command which emulated the running of the workflow on a machine with no job scheduler).

On MacOS (Catalina, 10.15.7), with GNU 10.1.0 compilers, was able to successfully generate workflow, and run end-to-end successfully. Currently there is a bug in UFS_UTILS that makes the make_orog test fail; UFS UTILS PR245 must be merged to fix this.

Resolves #369

* Source bash utils in the workflow launch script and set the workflow manager as Rocoto. (#426)

## DESCRIPTION OF CHANGES: 
Added sourcing of bash utilities to avoid $SED undefined variable error when using the workflow launch script.  Add Rocoto as the workflow manager on Gaea.

## TESTS CONDUCTED: 
Tested on Gaea. Release branch end-to-end tests (aside from 3km runs) were run on Hera and all passed.

## CONTRIBUTORS (optional): 
@climbfuji, @mkavulich, @gsketefian

* Run with LINUX + rocoto.

* Adding reference configs for Hera

* Updating configs to work on Hera.

* Add configurable options needed for linux.

* Add modulefiles needed for linux.

* Remove the first instance of "RUN_CMD_FCST" in var_defns.sh to avoid potential undefined variable issues (#433)

## DESCRIPTION OF CHANGES: 
It was found that if set -u is in the user's default bash environment, this will cause the launch script or individual run scripts to fail because you're using a variable before it's defined; this is likely to occur if you submit any of these scripts from a crontab. This was due to the way that the default run command was set up for MacOS and generic LINUX platforms, which was a bit of a hack that resulted in RUN_CMD_FCST being defined twice in var_defns.sh. The fix will delete the first instance of RUN_CMD_FCST in var_defns.sh so that it is no longer referencing an undefined variable early on.

This potential bug does not affect Tier 1 supported platforms, only MacOS and generic Linux.

## TESTS CONDUCTED: 
Tested on affected MacOS platform and the fix worked. Also ran end-to-end tests on Hera and Cheyenne (still running) as a sanity check.

* Other mods for running on Hera.

* Fix needed for create_diag_table_file change.

Co-authored-by: Michael Kavulich <kavulich@ucar.edu>
Co-authored-by: JeffBeck-NOAA <55201531+JeffBeck-NOAA@users.noreply.github.com>
mkavulich added a commit that referenced this pull request Sep 22, 2021
## DESCRIPTION OF CHANGES: 
This change will add the capability to run regional_workflow (as part of the SRW app) on MacOS and generic LINUX platforms. Most of these changes are identical to those in #402 (hash bc08607) but some additional modifications needed to be made due to intervening changes in the develop branch.

## TESTS CONDUCTED: 
Ran Graduate Student Test on new platforms:
 - my personal Mac machine (MacOS Catalina 10.15.7) MacOS with gnu 9.4.0 compilers. 
 - Cheyenne compute node as a faux "stand-alone" machine, intel 19.1.1 compilers

Ran suite of end-to-end tests on Cheyenne (intel/19.1.1) and Hera (intel/18.0.5.274). All passed as expected.

Tests also passed on WCOSS, MacOS Mojave, RedHat Linux.

## ISSUE: 
Will resolve #369
TrevorAlcott-NOAA pushed a commit to TrevorAlcott-NOAA/regional_workflow that referenced this pull request Jun 29, 2022
* Add RRFSE local config changes

* Add archiving of ensprod
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Tested on Hera Tested successfully on Hera machine Tested on Jet Successfully tested on Jet machine
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants