Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Updated modulefiles for Cheyenne, Hera, Jet, Gaea, Orion #419

Closed
wants to merge 41 commits into from

Conversation

natalie-perlin
Copy link
Collaborator

@natalie-perlin natalie-perlin commented Oct 14, 2022

DESCRIPTION OF CHANGES:

Updated modulefiles for the following HPC systems: Cheyenne, Gaea, Hera, Orion, Jet, in particular, for the SRW 2.1 release. The changes include new paths for HPC-stack and miniconda3 installations in EPIC-managed common space. Met and Metplus verification packages were installed as part of the hpc-stack. Rocoto has also been installed on Cheyenne in common EPIC-manages space. Met and Metplus modules are not loaded explicitly as modules, but the installation paths are specified in ./ush/machine/<platform>.yaml files. The packages needed for plotting routines are included in the regional_workflow conda virtual environment.

The following types of modulefiles were updated:

./modulefiles/build_<platform>
./modulefiles/wflow
<platform>
./modulefiles/tasks/<platform>/miniconda_regional_workflow
./modulefiles/tasks/<platform>/make_grid.local, make_ics.local, make_lbcs.local, make_orog.local, get_extrn_lbcs.local, get_extrn_ics.local, run_fcst.local, run_vx.local
./ush/machine/<platform>.yaml

where <platform> is Cheyenne, Jet, Hera, Orion, Gaea,
and compiler either <intel> or <gnu>; Cheyenne has both compilers

A list of the recent hpc-stack builds updated recently for the current PR (plus additional compiler-mpi combinations for Hera) could be found in the following open issue in the UFS-WM/Issues-1465:.

Type of change

  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

TESTS CONDUCTED:

  • hera.intel - building SRW, running tests via Jenkins, basic plots via Jenkins
  • orion.intel - building SRW, running tests via Jenkins, basic plots via Jenkins
  • cheyenne.intel - building SRW, running tests via Jenkins
  • cheyenne.gnu - building SRW on
  • gaea.intel - building SRW, running tests via Jenkins, basic plots via Jenkins
  • jet.intel - building SRW, running tests via Jenkins, basic plots via Jenkins
  • wcoss2.intel
  • NOAA Cloud (indicate which platform)
  • Jenkins
  • fundamental test suite
  • comprehensive tests (specify which if a subset was used)

DEPENDENCIES:

Documentation needs to be updated, including the Chapter on making plots.

ISSUE:

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

LABELS (optional):

A Code Manager needs to add the following labels to this PR:

  • Work In Progress
  • bug
  • enhancement
  • documentation
  • release
  • high priority
  • run_ci
  • run_we2e_fundamental_tests
  • run_we2e_comprehensive_tests
  • Needs Cheyenne test
  • Needs Jet test
  • Needs Hera test
  • Needs Orion test
  • help wanted

CONTRIBUTORS (optional):

@BruceKropp-Raytheon (Jenkinks jobs building the SRW and running the tests)
@EdwardSnyder-NOAA (tests on Gaea and Hera, met/metplus verification)

Update the modulefile  `build_cheyenne_gnu` and use module paths for EPIC-managed hpc-stack, updated miniconda3 with the regional_workflow (running+plotting), and rocoto
Update the modulefile  `build_cheyenne_intel` and use module paths for EPIC-managed hpc-stack, updated miniconda3 with the regional_workflow (running+plotting), and rocoto
Update the modulefile  `wflow_cheyenne` and use module paths for EPIC-managed miniconda3 with the regional_workflow (running+plotting) and rocoto
Update the modulefile  `build_hera_intel` and use module paths for EPIC-managed hpc-stack, updated miniconda3 with the regional_workflow (running+plotting)
Update the modulefile  `wflow_hera` and use module paths for EPIC-managed hpc-stack, updated miniconda3 with the regional_workflow (running+plotting)
Update the modulefile  `build_jet_intel` and use module paths for EPIC-managed hpc-stack, updated miniconda3 with the regional_workflow (running+plotting)
Update the modulefile  `wflow_jet` and use module path for the EPIC-managed and updated miniconda3 with the regional_workflow (running+plotting)
Update the modulefile  `build_orion_intel` and use module paths for the EPIC-managed hpc-stack and updated miniconda3 with the regional_workflow (running+plotting)
Update the modulefile  `flow_orion`, use a module path for EPIC-managed miniconda3 with the regional_workflow (running+plotting)
Met and metplus installed as modules in the recent hpc-stack, update installation paths
Met and metplus installed as modules in the recent hpc-stack, update installation paths
Met and metplus installed as modules in the recent hpc-stack, update installation paths
@MichaelLueken
Copy link
Collaborator

@MichaelLueken - All the modules you mentioned, bufr/11.7.0, ncio/1.1.2, and nccmp/1.8.9.0 are built in the locations for this R-419. I just have added nccmp/1.8.9.0 today to Gaea and Orion and set as a default version (hpc-stack had different version build earlier).

@natalie-perlin - Sorry about that, I meant the paths to the hpc-stacks on Gaea, Orion, and Jet, not the updated hpc-stacks that you have created for this PR:

Orion: /work/noaa/epic-ps/hpc-stack/libs/intel/2022.1.2/modulefiles/stack
Gaea: /lustre/f2/pdata/ncep_shared/hpc-stack.epic/libs/intel/2021.3.0/modulefiles/stack
Jet: /lfs4/HFIP/hfv3gfs/nwprod/hpc-stack.epic/libs/intel/2022.1.2/modulefiles/stack

These three stack locations would need to be updated before we can try and move forward with using them for the SRW.

@natalie-perlin
Copy link
Collaborator Author

natalie-perlin commented Oct 17, 2022

@MichaelLueken - these stack locations were not built using role-epic account, but under jongkim/Jong.Kim username. There are no logs, or indication how they were configured and location of hpc-stack source code directories (only guessing), and thus only Jong Kim (@jkbk2004) could know the way they were built, and could update them.

@MichaelLueken
Copy link
Collaborator

@natalie-perlin Yes, you are correct that only Jong can update the modules in the paths that I noted above and in issue #409. This was just pointing out that we need to sync the hpc-stack used for both SRW and the weather model. It sounds like you might need to add some updated versions for the weather model to use the epic.role stack locations, while Jong would need to add several modules to his personal stack locations before the SRW would be able to use his personal stack locations. Again, the primary goal is to use a single stack for both the SRW and weather model. You have completed the necessary work to use the epic.role version for the SRW, but these locations will also need to work for the weather model, with a PR opened in the weather model repository updating the location of the stack.

@panll
Copy link
Collaborator

panll commented Oct 18, 2022

It was tested on Hera and works fine.

@BruceKropp-Raytheon
Copy link
Contributor

e2e tests not yet passing on Jet.

Copy link
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@natalie-perlin Would it be possible to roll the compiler and mpi versions back to what is currently being used on Cheyenne, Hera, and Jet? While it is nice to see that the SRW app builds and runs the workflow using these updated compilers and mpi versions, none of our WE2E tests check for reproducibility, among other aspects.

As is, the ufs-weather-model regression tests will need to be run to see if these updated compilers and mpi versions affect results. This lies outside the scope of the ufs-srweather-app and is one of the reason why a similar PR needs to opened for the ufs-weather-model repository (the other being to ensure that both the SRW app and weather model use the same EPIC maintained stacks).

Comment on lines -13 to +12
module load gnu/11.2.0
module load gnu/12.1.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the compiler version updated to the latest on Cheyenne? Would it be possible to bring this back down to gnu/11.2.0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attempt was to use the latest available compiler. The same libraries for the gnu/11.2.0 will be built wehn cheyenne comes back from the maintenance, if this is preferred.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @natalie-perlin. Yes, please built the same libraries for gnu/11.2.0 on Cheyenne once it is back up.

Comment on lines 20 to 21
module load hpc-intel/2022.1.2
module load hpc-impi/2022.1.2
module load hpc-intel/2022.2.0
module load hpc-impi/2022.2.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were the compiler and mpi versions updated to the latest on Hera? Would it be possible to bring these back down to intel/2022.1.2 and impi/2022.1.2?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attempt was to use the latest available compiler. The same libraries for the intel/2021.1.2 are being built and the PR will be updates asap, if this intel/2021.1.2 is preferred (tested with the WM)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @natalie-perlin! Yes, please build the same libraries using intel/2022.1.2 on Hera.

Comment on lines 18 to 23
module load hpc-intel/2022.1.2
module load hpc-impi/2022.1.2
module load hpc-intel/2022.2.0
module load hpc-impi/2022.2.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were the compiler and mpi versions updated to the latest on Jet? Would it be possible to bring these back down to intel/2022.1.2 and impi/2022.1.2?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this is not a problem. Will prepare it for the earlier version of intel/2022.1.2. The idea was to update it to the most recent compiler/mpi combination

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @natalie-perlin!

@natalie-perlin
Copy link
Collaborator Author

The hpc libraries and modules built with intel/2022.1.2 compiler and impi/2022.1.2 have been prepared for Hera and Jet.

reversing back to use previous compiler and impi 2022.1.2 from 2022.2.0
reversing to a verified compiler/impi version 2022.1.2 from 2022.2.0
@BruceKropp-Raytheon
Copy link
Contributor

e2e tests verified on Jet. thanks.

@natalie-perlin
Copy link
Collaborator Author

Modulefiles in SRW were updated to use EPIC-managed miniconda3/4.12.0 for the workflow and running tasks. The
PR#444 (@mark-a-potts) included these changes and has been already approved and merged into develop

@natalie-perlin natalie-perlin added enhancement New feature or request and removed release This PR/issue is related to a release branch Priority: HIGH labels Nov 3, 2022
@JeffBeck-NOAA JeffBeck-NOAA changed the title Updated modulefiles for Cheyenne, Hera, Jet, Gaea, Orion [develop] Updated modulefiles for Cheyenne, Hera, Jet, Gaea, Orion Nov 11, 2022
@MichaelLueken
Copy link
Collaborator

@natalie-perlin With the merge of PR #549 (at d036849), the ufs-srweather-app is now using your EPIC-maintained HPC-Stack locations on all machines. I am now closing this PR.

If you feel that the PR should be reopened, please let me know.

@natalie-perlin natalie-perlin deleted the develop_2 branch October 13, 2023 03:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants