Skip to content

Commit

Permalink
Include documentation for building/running SRW App on Mac (#240)
Browse files Browse the repository at this point in the history
* updated docs

* added git submodule

* fix formatting

* added new submodule commits

* fixed ref links

* finished Intro

* finish Components & Intro edits

* edited Rocoto workflow section of Quickstart

* added minor hpc submodule commits

* Updates to Rocoto Workflow in Quick Start

* add to HPC-stack intro

* submodule updates

* added submodule docs edits

* hpc-stack updates & formatting fixes

* hpc-stack intro edits

* bibtex attempted fix

* add hpc-stack module edits

* update sphinxcontrib version

* add .readthedocs.yaml file

* update .readthedocs.yaml file

* update .readthedocs.yaml file

* update conf.py

* updates .readthedocs.yaml with submodules

* updates .readthedocs.yaml with submodules

* submodule updates

* submodule updates

* minor Intro edits

* minor Intro edits

* minor Intro edits

* submodule updates

* fixed typos in QS

* QS updates

* QS updates

* QS updates

* updates to InputOutput and QS

* fix I/O doc typos

* pull updates to hpc-stack docs

* pull updates to hpc-stack docs

* fix table wrapping

* updates to QS for cloud

* fix QS export statements

* fix QS export statements

* QS edits on bind, config

* add bullet points to notes

* running without rocoto

* add HPC-Stack submodule w/docs

* split QS into container/non-container approaches

* added filepath changes for running in container on Orion, et al.

* edits to overview and container QS

* moved CodeReposAndDirs.rst info to the Introduction & deleted file

* continued edits to SRWAppOverview

* combine overview w/non-container docs

* finish merging non-container guide & SRWOverview, rename/remove files, update FAQ

* minor edits for Intro & QS

* updates to BuildRun doc through 3.8.1

* edits to Build/Run and Components

* remove .gitignore

* fix Ch 3 title, 4 supported platform levels note

* fix typos, add term links

* other minor fixes/suggestions implemented

* updated Intro based on feedback; changed SRW to SRW App throughout

* update comment to Intro citation

* add user-defined vertical levels to future work

* Add instructions for srw_common module load

* fix typo

* update Intro & BuildRunSRW based on Mark's feedback

* minor intro updates

* 1st round of jwolff's edits

* 2nd round of jwolff updates

* update QS intro

* fix minor physics details

* update citation and physics suite name

* add compute node allocation info to QS

* add authoritative hpc-stack docs to Intro

* create MacOS install/build instructions

* add MacOS Build/Run instructions

* fix MacOS Build/Run details

* add MacOS info directly to Build/Run SRW chapter

* minor details

* minor edits

* update Include-HPCInstall with mac installation docs

* add note re: Terminal.app & bash shell in MacOS section

* remove MacInstall file-contents added to BuildRunSRW

* update hpc-stack submodule to include mac installation info

* add MacOS config details

* add MacOS config & run details

* minor MacOS note

* mention need to verify software library version #'s

* update hpc-stack-mod

* align MacDetails section with PR #238 info

* remove gsed & alter related commands

* update hpc-stack submodule

* typos

* switch from env to module load

Co-authored-by: Will Mayfield <59745143+willmayfield@users.noreply.github.com>

* Update BuildRunSRW.rst

* update hpc-stack module docs & MacOS config.sh

* update machine file instructions

* updates to BuildRun chapter

* fix typo

Co-authored-by: gspetro <gillian.s.petro@gmail.com>
Co-authored-by: Will Mayfield <59745143+willmayfield@users.noreply.github.com>
  • Loading branch information
3 people authored May 26, 2022
1 parent d0c0c69 commit 3a2fd2b
Show file tree
Hide file tree
Showing 3 changed files with 229 additions and 11 deletions.
233 changes: 222 additions & 11 deletions docs/UsersGuide/source/BuildRunSRW.rst
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,11 @@ The SRW Application source code is publicly available on GitHub. To download the
COMMENT: This will need to be changed to the updated release branch of the SRW repo once it exists.
The cloned repository contains the configuration files and sub-directories shown in
:numref:`Table %s <FilesAndSubDirs>`.
:numref:`Table %s <FilesAndSubDirs>`. The user may set an ``$SRW`` environmental variable to point to the location of the new ``ufs-srweather-app`` repository. For example, if ``ufs-srweather-app`` was cloned into the $HOME directory:

.. code-block:: console
export SRW=$HOME/ufs-srweather-app
.. _FilesAndSubDirs:

Expand Down Expand Up @@ -129,18 +133,20 @@ Run the executable that pulls in SRW App components from external repositories:

.. code-block:: console
cd ufs-srweather-app
cd $SRW
./manage_externals/checkout_externals
The script should output dialogue indicating that it is retrieving different code repositories. It may take several minutes to download these repositories.


.. _BuildExecutables:

Set Up the Environment and Build the Executables
===================================================

.. _DevBuild:


``devbuild.sh`` Approach
-----------------------------

Expand All @@ -152,7 +158,7 @@ On Level 1 systems for which a modulefile is provided under the ``modulefiles``
where ``<machine_name>`` is replaced with the name of the platform the user is working on. Valid values are: ``cheyenne`` | ``gaea`` | ``hera`` | ``jet`` | ``macos`` | ``odin`` | ``orion`` | ``singularity`` | ``wcoss_dell_p3``

If compiler auto-detection fails for some reason, specify it using the ``--compiler`` argument. FOr example:
If compiler auto-detection fails for some reason, specify it using the ``--compiler`` argument. For example:

.. code-block:: console
Expand Down Expand Up @@ -288,6 +294,70 @@ The build will take a few minutes to complete. When it starts, a random number i

If you see the build.out file, but there is no ``ufs-srweather-app/bin`` directory, wait a few more minutes for the build to complete.

.. _MacDetails:

Additional Details for Building on MacOS
------------------------------------------

.. note::
Users not building the SRW App to run on MacOS may skip to the :ref:`next section <BuildExecutables>`.

The SRW App can be built on MacOS systems, presuming HPC-Stack has already been successfully installed. The following two options have been tested:

* **Option 1:** MacBookAir 2020, M1 chip (arm64, running natively), 4+4 cores, Big Sur 11.6.4, GNU compiler suite v.11.2.0_3 (gcc, gfortran, g++); no MPI pre-installed

* **Option 2:** MacBook Pro 2015, 2.8 GHz Quad-Core Intel Core i7 (x86_64), Catalina OS X 10.15.7, GNU compiler suite v.11.2.0_3 (gcc, gfortran, g++); no MPI pre-installed

The ``build_macos_gnu`` modulefile initializes the module environment, lists the location of HPC-Stack modules, loads the meta-modules and modules, and sets compilers, additional flags, and environment variables needed for building the SRW App. The modulefile must be modified to include the absolute path to the user's HPC-Stack installation and ``ufs-srweather-app`` directories. In particular, the following section must be modified:

.. code-block:: console
# This path should point to your HPCstack installation directory
setenv HPCstack "/Users/username/hpc-stack/install"
# This path should point to your SRW Application directory
setenv SRW "/Users/username/ufs-srweather-app"
An excerpt of the ``build_macos_gnu`` contents appears below for Option 1. To use Option 2, the user will need to comment out the lines specific to Option 1 and uncomment the lines specific to Option 2 in the ``build_macos_gnu`` modulefile. Additionally, users need to verify that all file paths reflect their system's configuration and that the correct version numbers for software libraries appear in the modulefile.

.. code-block:: console
# Option 1 compiler paths:
setenv CC "/opt/homebrew/bin/gcc"
setenv FC "/opt/homebrew/bin/gfortran"
setenv CXX "/opt/homebrew/bin/g++"
# Option 2 compiler paths:
#setenv CC "/usr/local/bin/gcc"
#setenv FC "/usr/local/bin/gfortran"
#setenv CXX "/usr/local/bin/g++"
Then, users must source the Lmod setup file, just as they would on other systems, and load the modulefiles needed for building and running SRW App:

.. code-block:: console
source etc/lmod-setup.sh macos
module use <path/to/ufs-srweather-app/modulefiles>
module load build_macos_gnu
In a csh/tcsh shell, users would run ``source etc/lmod-setup.csh macos`` in place of the first line in the code above.

.. note::
If you execute ``source etc/lmod-setup.sh`` on systems that don't need it, it will simply do a ``module purge``.

Additionally, for Option 1 systems, set the variable ``ENABLE_QUAD_PRECISION`` to ``OFF`` in line 35 of the ``$SRW/src/ufs-weather-model/FV3/atmos_cubed_sphere/CMakeLists.txt`` file. This change is optional if using Option 2 to build the SRW App. Using a text editor (e.g., vi, vim, emacs):

.. code-block:: console
option(ENABLE_QUAD_PRECISION "Enable compiler definition -DENABLE_QUAD_PRECISION" OFF)
An alternative way to make this change is using a `sed` (streamline editor). From the command line, users can run one of two commands (user's preference):

.. code-block:: console
sed -i -e 's/QUAD_PRECISION\" ON)/QUAD_PRECISION\" OFF)/' CMakeLists.txt
sed -i -e 's/bin\/sh/bin\/bash/g' *sh
.. _Data:

Expand Down Expand Up @@ -387,7 +457,7 @@ settings. There is usually no need for a user to modify the default configuratio
+----------------------+------------------------------------------------------------+
| NOMADS | NOMADS, NOMADS_file_type |
+----------------------+------------------------------------------------------------+
| External model | USE_USER_STAGED_EXTRN_FILES, EXTRN_MDL_SOURCE_BASEDRI_ICS, |
| External model | USE_USER_STAGED_EXTRN_FILES, EXTRN_MDL_SOURCE_BASEDIR_ICS, |
| | EXTRN_MDL_FILES_ICS, EXTRN_MDL_SOURCE_BASEDIR_LBCS, |
| | EXTRN_MDL_FILES_LBCS |
+----------------------+------------------------------------------------------------+
Expand Down Expand Up @@ -561,14 +631,18 @@ To get started, make a copy of ``config.community.sh``. From the ``ufs-srweather

.. code-block:: console
cd regional_workflow/ush
cd $SRW/regional_workflow/ush
cp config.community.sh config.sh
The default settings in this file include a predefined 25-km :term:`CONUS` grid (RRFS_CONUS_25km), the :term:`GFS` v16 physics suite (FV3_GFS_v16 :term:`CCPP`), and :term:`FV3`-based GFS raw external model data for initialization.

Next, edit the new ``config.sh`` file to customize it for your machine. At a minimum, change the ``MACHINE`` and ``ACCOUNT`` variables; then choose a name for the experiment directory by setting ``EXPT_SUBDIR``. If you have pre-staged the initialization data for the experiment, set ``USE_USER_STAGED_EXTRN_FILES="TRUE"``, and set the paths to the data for ``EXTRN_MDL_SOURCE_BASEDIR_ICS`` and ``EXTRN_MDL_SOURCE_BASEDIR_LBCS``.

Sample settings are indicated below for Level 1 platforms. Detailed guidance applicable to all systems can be found in :numref:`Chapter %s: Configuring the Workflow <ConfigWorkflow>`, which discusses each variable and the options available. Additionally, information about the four predefined Limited Area Model (LAM) Grid options can be found in :numref:`Chapter %s: Limited Area Model (LAM) Grids <LAMGrids>`.
.. note::

MacOS users should refer to :numref:`Section %s <MacConfig>` for details on configuring an experiment on MacOS.

Sample settings are indicated below for Level 1 platforms. Detailed guidance applicable to all systems can be found in :numref:`Chapter %s: Configuring the Workflow <ConfigWorkflow>`, which discusses each variable and the options available. Additionally, information about the three predefined Limited Area Model (LAM) Grid options can be found in :numref:`Chapter %s: Limited Area Model (LAM) Grids <LAMGrids>`.

.. important::

Expand Down Expand Up @@ -659,8 +733,7 @@ For WCOSS_DELL_P3:
.. note::

The values of the configuration variables should be consistent with those in the
``valid_param_vals script``. In addition, various example configuration files can be
found in the ``regional_workflow/tests/baseline_configs`` directory.
``valid_param_vals script``. In addition, various example configuration files can be found in the ``regional_workflow/tests/baseline_configs`` directory.

.. _VXConfig:

Expand Down Expand Up @@ -715,8 +788,8 @@ These tasks are independent, so users may set some values to "TRUE" and others t

.. _SetUpPythonEnv:

Set up the Python and other Environment Parameters
--------------------------------------------------
Set Up the Python and Other Environment Parameters
----------------------------------------------------
The workflow requires Python 3 with the packages 'PyYAML', 'Jinja2', and 'f90nml' available. This Python environment has already been set up on Level 1 platforms, and it can be activated in the following way (from ``/ufs-srweather-app/regional_workflow/ush``):

.. code-block:: console
Expand All @@ -733,6 +806,143 @@ This command will activate the ``regional_workflow`` conda environment. The user
source ~/.bashrc
conda activate regional_workflow
.. _MacConfig:

Configuring an Experiment on MacOS
------------------------------------------------------------

In principle, the configuration process for MacOS systems is the same as for other systems. However, the details of the configuration process on MacOS require a few extra steps.

.. note::
Examples in this subsection presume that the user is running Terminal.app with a bash shell environment. If this is not the case, users will need to adjust the commands to fit their command line application and shell environment.

.. _MacMorePackages:

Install Additional Packages
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Check the version of bash, and upgrade it if it is lower than 4. Additionally, install the ``coreutils`` package:

.. code-block:: console
bash --version
brew upgrade bash
brew install coreutils
.. _MacVEnv:

Create a Python Virtual Environment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Users must create a python virtual environment for running the SRW on MacOS. This involves setting python3 as default, adding required python modules, and sourcing the ``regional_workflow``.

.. code-block:: console
python3 -m pip --version
python3 -m pip install --upgrade pip
python3 -m ensurepip --default-pip
python3 -m venv $HOME/venv/regional_workflow
source $HOME/venv/regional_workflow/bin/activate
python3 -m pip install jinja2
python3 -m pip install pyyaml
python3 -m pip install f90nml
python3 -m pip install ruby OR: brew install ruby
The virtual environment can be deactivated by running the ``deactivate`` command. The virtual environment built here will be reactivated in :numref:`Step %s <MacActivateWFenv>` and needs to be used to generate the workflow and run the experiment.

Install Rocoto
^^^^^^^^^^^^^^^^^^

.. note::
Users may `install Rocoto <https://github.com/christopherwharrop/rocoto/blob/develop/INSTALL>`__ if they want to make use of a workflow manager to run their experiments. However, this option has not been tested yet on MacOS and is not supported for this release.


Configure the SRW App
^^^^^^^^^^^^^^^^^^^^^^^^

Users will need to configure their experiment just like on any other system. From the ``$SRW/regional_workflow/ush`` directory, users can copy the settings from ``config.community.sh`` into a ``config.sh`` file (see :numref:`Section %s <UserSpecificConfig>`) above. In the ``config.sh`` file, users should set ``MACHINE="macos"`` and modify additional variables as needed. For example:

.. code-block:: console
MACHINE="macos"
ACCOUNT="user"
EXPT_SUBDIR="<test_community>"
COMPILER="gnu"
VERBOSE="TRUE"
RUN_ENVIR="community"
PREEXISTING_DIR_METHOD="rename"
PREDEF_GRID_NAME="RRFS_CONUS_25km"
QUILTING="TRUE"
Due to the limited number of processors on Mac OS systems, users must configure the domain decomposition defaults (usually, there are only 8 CPUs in M1-family chips and 4 CPUs for x86_64).

For :ref:`Option 1 <MacDetails>`, add the following information to ``config.sh``:

.. code-block:: console
LAYOUT_X="${LAYOUT_X:-3}"
LAYOUT_Y="${LAYOUT_Y:-2}"
WRTCMP_write_groups="1"
WRTCMP_write_tasks_per_group="2"
For :ref:`Option 2 <MacDetails>`, add the following information to ``config.sh``:

.. code-block:: console
LAYOUT_X="${LAYOUT_X:-3}"
LAYOUT_Y="${LAYOUT_Y:-1}"
WRTCMP_write_groups="1"
WRTCMP_write_tasks_per_group="1"
Configure the Machine File
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Configure the machine file based on the number of CPUs in the system (8 or 4). Specify the following variables in ``$SRW/regional_workflow/ush/machine/macos.sh``:

For Option 1 (8 CPUs):

.. code-block:: console
# Commands to run at the start of each workflow task.
PRE_TASK_CMDS='{ ulimit -a; }'
# Architecture information
WORKFLOW_MANAGER="none"
NCORES_PER_NODE=${NCORES_PER_NODE:-8} (Option 2: when 4 CPUs, set to 4)
SCHED=${SCHED:-"none"}
# UFS SRW App specific paths
FIXgsm="path/to/FIXgsm/files"
FIXaer="path/to/FIXaer/files"
FIXlut="path/to/FIXlut/files"
TOPO_DIR="path/to/FIXgsm/files" # (path to location of static input files
used by the ``make_orog`` task)
SFC_CLIMO_INPUT_DIR="path/to/FIXgsm/files" # (path to location of static surface climatology
input fields used by ``sfc_climo_gen``)
# Run commands for executables
RUN_CMD_SERIAL="time"
RUN_CMD_UTILS="mpirun -np 4"
RUN_CMD_FCST='mpirun -np ${PE_MEMBER01}'
RUN_CMD_POST="mpirun -np 4"
PRE_TASK_CMDS='{ulimit -a;}'
The same settings can be used for Option 2, except that ``NCORES_PER_NODE=${NCORES_PER_NODE:-8}`` should be set to 4 instead of 8.

.. _MacActivateWFenv:

Activate the Workflow Environment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``regional_workflow`` environment can be activated on MacOS as it is for any other system:

.. code-block:: console
cd $SRW/regional_workflow/ush
module load wflow_macos
This should activate the ``regional_workflow`` environment created in :numref:`Step %s <MacVEnv>`. From here, the user may continue to the :ref:`next step <GenerateWorkflow>` and generate the regional workflow.


.. _GenerateWorkflow:

Expand Down Expand Up @@ -941,7 +1151,7 @@ If the login shell is csh/tcsh, it can be set using:

.. code-block:: console
setenv EXPTDIR /path-to-experiment/directory
setenv EXPTDIR /<path-to-experiment>/<directory_name>
Launch the Rocoto Workflow Using a Script
Expand Down Expand Up @@ -1104,6 +1314,7 @@ After finishing the experiment, open the crontab using ``crontab -e`` and delete

On Orion, *cron* is only available on the orion-login-1 node, so users will need to work on that node when running *cron* jobs on Orion.


The workflow run is complete when all tasks have "SUCCEEDED", and the rocotostat command outputs a table similar to the one :ref:`above <Success>`.

.. _PlotOutput:
Expand Down
6 changes: 6 additions & 0 deletions docs/UsersGuide/source/Quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -334,6 +334,12 @@ Check the batch script output file in your experiment directory for a “SUCCESS
| | | | forecast hour) |
+------------+------------------------+----------------+----------------------------+

Users can access log files for specific tasks in the ``$EXPTDIR/log`` directory. To see how the experiment is progressing, users can also check the end of the ``log.launch_FV3LAM_wflow`` file from the command line:

.. code-block:: console
tail -n 40 log.launch_FV3LAM_wflow
.. hint::
If any of the scripts return an error that "Primary job terminated normally, but one process returned a non-zero exit code," there may not be enough space on one node to run the process. On an HPC system, the user will need to allocate a(nother) compute node. The process for doing so is system-dependent, and users should check the documentation available for their HPC system. Instructions for allocating a compute node on NOAA Cloud systems can be viewed in the :numref:`Step %s <WorkOnHPC>` as an example.

Expand Down
1 change: 1 addition & 0 deletions docs/UsersGuide/source/WE2Etests.rst
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,7 @@ above, such as ``wflow_features``:

Adding a New WE2E Test Category
-----------------------------------

To create a new test category called, e.g., ``new_category``:

#. In the directory ``ufs-srweather-app/regional_workflow/tests/WE2E/test_configs``, create a new directory named ``new_category``.
Expand Down

0 comments on commit 3a2fd2b

Please sign in to comment.