Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update GFSv16.2.0 tag to use hpc-stack on Hera and Orion #629

Closed
GeorgeGayno-NOAA opened this issue Feb 8, 2022 · 29 comments
Closed

Update GFSv16.2.0 tag to use hpc-stack on Hera and Orion #629

GeorgeGayno-NOAA opened this issue Feb 8, 2022 · 29 comments
Assignees
Labels
maintenance Basic upkeep

Comments

@GeorgeGayno-NOAA
Copy link
Collaborator

To support the global-workflow group.

NOAA-EMC/global-workflow#639

@GeorgeGayno-NOAA GeorgeGayno-NOAA added the maintenance Basic upkeep label Feb 8, 2022
@GeorgeGayno-NOAA GeorgeGayno-NOAA self-assigned this Feb 8, 2022
GeorgeGayno-NOAA added a commit that referenced this issue Feb 8, 2022
Also, unload the first build module at the end of the
loop.

Fixes #629.
GeorgeGayno-NOAA added a commit that referenced this issue Feb 8, 2022
GeorgeGayno-NOAA added a commit that referenced this issue Feb 8, 2022
GeorgeGayno-NOAA added a commit that referenced this issue Feb 9, 2022
@GeorgeGayno-NOAA
Copy link
Collaborator Author

@KateFriedman-NOAA - Try my hotfix branch at ba7efbd. I don't know what default library versions you plan to use. I can adjust those later.

@KateFriedman-NOAA
Copy link
Collaborator

@KateFriedman-NOAA - Try my hotfix branch at ba7efbd. I don't know what default library versions you plan to use. I can adjust those later.

@GeorgeGayno-NOAA So global_cycle builds without an error but the emcsfc build fails. If I run the emcsfc build standalone it builds. I looked at the diffs between building it standalone and via global-workflow. I see you are using the hpc-intel/2020.2 module as the default but via global-workflow I builds with hpc-intel/2018.4 (and I override your setting when I build). The g2 library path is missing when I build on Orion because of the mismatch I think. Can you use the 2018 stack on Orion/Hera?

From the new global-workflow $target.ver files:

orion.ver:

export hpc_ver=1.1.0
export hpc_intel_ver=2018.4
export hpc_impi_ver=2018.4

hera.ver:

export hpc_ver=1.1.0
export hpc_intel_ver=18.0.5.274
export hpc_impi_ver=2018.0.4

@kgerheiser
Copy link
Contributor

Yes, I would use 2018.4 on Orion.

@GeorgeGayno-NOAA
Copy link
Collaborator Author

Yes, I would use 2018.4 on Orion.

Will do.

@GeorgeGayno-NOAA
Copy link
Collaborator Author

@KateFriedman-NOAA - Try my hotfix branch at ba7efbd. I don't know what default library versions you plan to use. I can adjust those later.

@GeorgeGayno-NOAA So global_cycle builds without an error but the emcsfc build fails. If I run the emcsfc build standalone it builds. I looked at the diffs between building it standalone and via global-workflow. I see you are using the hpc-intel/2020.2 module as the default but via global-workflow I builds with hpc-intel/2018.4 (and I override your setting when I build). The g2 library path is missing when I build on Orion because of the mismatch I think. Can you use the 2018 stack on Orion/Hera?

From the new global-workflow $target.ver files:

orion.ver:

export hpc_ver=1.1.0
export hpc_intel_ver=2018.4
export hpc_impi_ver=2018.4

hera.ver:

export hpc_ver=1.1.0
export hpc_intel_ver=18.0.5.274
export hpc_impi_ver=2018.0.4

@KateFriedman-NOAA Try 3c5a350

@KateFriedman-NOAA
Copy link
Collaborator

@KateFriedman-NOAA Try 3c5a350

@GeorgeGayno-NOAA Thanks, it builds for me both 1) standalone (as-is) and 2) via global-workflow if I set export libpng_ver=1.6.35 in global-workflow's versions/orion.ver file. Please see these logs and confirm they indeed built ok:

/work/noaa/global/kfriedma/git/dev_v16/sorc/ufs_utils.fd/sorc/logs/build_cycle.log
/work/noaa/global/kfriedma/git/dev_v16/sorc/ufs_utils.fd/sorc/logs/build_emcsfc.log

Note: @kgerheiser opened NOAA-EMC/hpc-stack#387 to rename the png modules to libpng to match the WCOSS2 install. Once that happens you'll need to update your modulefiles to use libpng instead of png. I'm also waiting for this issue (NOAA-EMC/hpc-stack#388) to get completed and some library versions to get installed too, I'm using alternative versions at the moment just to test build/version file functionality. Given the needed stack updates happening on Hera/Orion we won't make the self-imposed 2/16 deadline to get the components updated. Will hopefully not happen too long after though.

@GeorgeGayno-NOAA
Copy link
Collaborator Author

@KateFriedman-NOAA Try 3c5a350

@GeorgeGayno-NOAA Thanks, it builds for me both 1) standalone (as-is) and 2) via global-workflow if I set export libpng_ver=1.6.35 in global-workflow's versions/orion.ver file. Please see these logs and confirm they indeed built ok:

/work/noaa/global/kfriedma/git/dev_v16/sorc/ufs_utils.fd/sorc/logs/build_cycle.log /work/noaa/global/kfriedma/git/dev_v16/sorc/ufs_utils.fd/sorc/logs/build_emcsfc.log

Kate, I am using 1.6.35 as the default for Orion. Should I use another version as default?

Note: @kgerheiser opened NOAA-EMC/hpc-stack#387 to rename the png modules to libpng to match the WCOSS2 install. Once that happens you'll need to update your modulefiles to use libpng instead of png. I'm also waiting for this issue (NOAA-EMC/hpc-stack#388) to get completed and some library versions to get installed too, I'm using alternative versions at the moment just to test build/version file functionality. Given the needed stack updates happening on Hera/Orion we won't make the self-imposed 2/16 deadline to get the components updated. Will hopefully not happen too long after though.

@KateFriedman-NOAA
Copy link
Collaborator

Kate, I am using 1.6.35 as the default for Orion. Should I use another version as default?

We're going to be using libpng/1.6.37 once it's available in the stacks on Hera/Orion so the default should be 1.6.37. I'm using the available png/1.6.35 at the moment to test functionality while awaiting the module updates.

GeorgeGayno-NOAA added a commit that referenced this issue Mar 3, 2022
build. Use variables from the netcdf module.

Fixes #629.
GeorgeGayno-NOAA added a commit that referenced this issue Mar 3, 2022
GeorgeGayno-NOAA added a commit that referenced this issue Mar 3, 2022
GeorgeGayno-NOAA added a commit that referenced this issue Mar 4, 2022
previously, but would not run without a system error.
This fixes the problem.

Fixes #629.
@GeorgeGayno-NOAA
Copy link
Collaborator Author

The branch at 423c704 was tested on Hera and Orion using these scripts:

Hera - /scratch1/NCEPDEV/da/George.Gayno/ufs_utils.git/UFS_UTILS.gfsv16
Orion - /work/noaa/da/ggayno/save/ufs_utils.git/gfsv16_tag

All tests passed.

GeorgeGayno-NOAA added a commit that referenced this issue Mar 4, 2022
@GeorgeGayno-NOAA
Copy link
Collaborator Author

The same tests were repeated using Intel 2022 (6195f80). All tests passed.

@kgerheiser
Copy link
Contributor

@GeorgeGayno-NOAA no problems with Intel MPI 2022?

@GeorgeGayno-NOAA
Copy link
Collaborator Author

@GeorgeGayno-NOAA no problems with Intel MPI 2022?

No. The global_cycle program uses MPI. And it ran fine for me.

@GeorgeGayno-NOAA
Copy link
Collaborator Author

@KateFriedman-NOAA Would you like to try 6195f80? It compiles and runs. But I am not sure I am using the correct library and compiler versions (or environment variable names).

@KateFriedman-NOAA
Copy link
Collaborator

@KateFriedman-NOAA Would you like to try 6195f80? It compiles and runs. But I am not sure I am using the correct library and compiler versions (or environment variable names).

Yes, thanks @GeorgeGayno-NOAA ! I'll check it out on Hera and Orion and report back.

@KateFriedman-NOAA
Copy link
Collaborator

@GeorgeGayno-NOAA The updated hash built without error on both Hera and Orion with the version file overrides in play. Please see the following logs to confirm the build success within g-w:

Hera: /scratch1/NCEPDEV/global/Kate.Friedman/git/dev_v16/sorc/ufs_utils.fd/sorc/logs/*
Orion: /work/noaa/global/kfriedma/git/dev_v16/sorc/ufs_utils.fd/sorc/logs/*

The g-w version settings used (build.ver & hera.ver/orion.ver):

Hera: /scratch1/NCEPDEV/global/Kate.Friedman/git/dev_v16/versions/
Orion: /work/noaa/global/kfriedma/git/dev_v16/versions/

@GeorgeGayno-NOAA
Copy link
Collaborator Author

@GeorgeGayno-NOAA The updated hash built without error on both Hera and Orion with the version file overrides in play. Please see the following logs to confirm the build success within g-w:

Hera: /scratch1/NCEPDEV/global/Kate.Friedman/git/dev_v16/sorc/ufs_utils.fd/sorc/logs/* Orion: /work/noaa/global/kfriedma/git/dev_v16/sorc/ufs_utils.fd/sorc/logs/*

The g-w version settings used (build.ver & hera.ver/orion.ver):

Hera: /scratch1/NCEPDEV/global/Kate.Friedman/git/dev_v16/versions/ Orion: /work/noaa/global/kfriedma/git/dev_v16/versions/

Looks like it worked. And I am using the same library/compiler versions as you are.

@KateFriedman-NOAA
Copy link
Collaborator

Looks like it worked. And I am using the same library/compiler versions as you are.

Great, thanks for confirming @GeorgeGayno-NOAA ! Once I get a few more of the updated components available I will test this updated UFS_UTILS branch execs in some cycled tests of the system on Hera/Orion. I'll report any issues regarding those execs. Hoping do that in the next week or two, need to wrap this up this month before the old modules get removed. Thanks!

@GeorgeGayno-NOAA
Copy link
Collaborator Author

@KateFriedman-NOAA I reverted back to Intel 2018. I also removed some unused utilities. See 75c74cd.

@KateFriedman-NOAA
Copy link
Collaborator

I reverted back to Intel 2018. I also removed some unused utilities. See 75c74cd.

Thanks @GeorgeGayno-NOAA ! Need to work on a WCOSS2 issue today so I'll try to test that hash tomorrow.

@KateFriedman-NOAA
Copy link
Collaborator

I updated my ufs_utils copies on Hera/Orion, reverted the g-w hera.ver back to the 2018 intel, and ran the ufs_utils build. Appear to have built without issue, please confirm, thanks:

Hera: /scratch1/NCEPDEV/global/Kate.Friedman/git/dev_v16/sorc/ufs_utils.fd/sorc/logs/*
Orion: /work/noaa/global/kfriedma/git/dev_v16/sorc/ufs_utils.fd/sorc/logs/*

@GeorgeGayno-NOAA
Copy link
Collaborator Author

I updated my ufs_utils copies on Hera/Orion, reverted the g-w hera.ver back to the 2018 intel, and ran the ufs_utils build. Appear to have built without issue, please confirm, thanks:

Hera: /scratch1/NCEPDEV/global/Kate.Friedman/git/dev_v16/sorc/ufs_utils.fd/sorc/logs/*
Orion: /work/noaa/global/kfriedma/git/dev_v16/sorc/ufs_utils.fd/sorc/logs/*

The Orion build looks good. On Hera, the global_cycle build used 2018, but the emcsfc build used 2022. That is odd.

@KateFriedman-NOAA
Copy link
Collaborator

@GeorgeGayno-NOAA I found a couple issues in the modulefiles for global_cycle and emcsfc (for Hera, didn't check others yet) but after fixing them I ran into another issue (the real issue).

A couple of the version variables in the following modulefiles were incorrect. I fixed them before hitting the next error I'll describe below:

-bash-4.2$ git diff modulefiles/*
diff --git a/modulefiles/fv3gfs/global_cycle.hera.lua b/modulefiles/fv3gfs/global_cycle.hera.lua
index 49fcf0d5..9c515f49 100644
--- a/modulefiles/fv3gfs/global_cycle.hera.lua
+++ b/modulefiles/fv3gfs/global_cycle.hera.lua
@@ -10,7 +10,7 @@ load(pathJoin("hpc", hpc_ver))
 hpc_intel_ver=os.getenv("hpc_intel_ver") or "18.0.5.274"
 load(pathJoin("hpc-intel", hpc_intel_ver))
 
-hpc_intel_ver=os.getenv("hpc_impi_ver") or "2018.0.4"
+hpc_impi_ver=os.getenv("hpc_impi_ver") or "2018.0.4"
 load(pathJoin("hpc-impi", hpc_impi_ver))
 
 ip_ver=os.getenv("ip_ver") or "3.3.3"
diff --git a/modulefiles/modulefile.global_emcsfc_ice_blend.hera.lua b/modulefiles/modulefile.global_emcsfc_ice_blend.hera.lua
index 140c759e..b0574ee0 100644
--- a/modulefiles/modulefile.global_emcsfc_ice_blend.hera.lua
+++ b/modulefiles/modulefile.global_emcsfc_ice_blend.hera.lua
@@ -7,7 +7,7 @@ prepend_path("MODULEPATH", "/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gf
 hpc_ver=os.getenv("hpc_ver") or "1.2.0"
 load(pathJoin("hpc", hpc_ver))
 
-hpc_ver=os.getenv("hpc_intel_ver") or "18.0.5.274"
+hpc_intel_ver=os.getenv("hpc_intel_ver") or "18.0.5.274"
 load(pathJoin("hpc-intel", hpc_intel_ver))

Not sure if you also need to load hpc-impi for the emcsfc builds, I didn't add that yet given the other error I hit.

When attempting to load hpc-intel/18.0.5.274 I get the following error:

+ module load global_cycle.hera
++ /apps/lmod/8.5.2/libexec/lmod bash load global_cycle.hera
Lmod has detected the following error: Cannot load module "hpc-intel/18.0.5.274" without these module(s) loaded:
   intel/18.0.5.274

While processing the following module(s):
    Module fullname       Module Filename
    ---------------       ---------------
    hpc-intel/18.0.5.274  /scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gfsv16/modulefiles/core/hpc-intel/18.0.5.274.lua
    global_cycle.hera     /scratch1/NCEPDEV/global/Kate.Friedman/git/dev_v16/sorc/ufs_utils.fd/modulefiles/fv3gfs/global_cycle.hera.lua

+ eval false

My guess it the build hits that error and then loads the hpc-intel default (which is 2022). The hpc-intel/18.0.5.274 module shows a prereq/load for intel/18.0.5.274:

-bash-4.2$ module show hpc-intel/18.0.5.274
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
   /scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gfsv16/modulefiles/core/hpc-intel/18.0.5.274.lua:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
help([[]])
family("MetaCompiler")
conflict("hpc-intel")
conflict("hpc-gnu","hpc-gcc")
load("intel/18.0.5.274")
prereq("intel/18.0.5.274")
prepend_path("MODULEPATH","/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gfsv16/modulefiles/compiler/intel/18.0.5.274")
setenv("FC","ifort")
setenv("CC","icc")
setenv("CXX","icpc")
setenv("SERIAL_FC","ifort")
setenv("SERIAL_CC","icc")
setenv("SERIAL_CXX","icpc")
whatis("Name: hpc-intel")
whatis("Version: 18.0.5.274")
whatis("Category: Compiler")
whatis("Description: Intel Compiler Family and module access")

@Hang-Lei-NOAA Why would we have to load intel/18.0.5.274 on Hera before being able to load hpc-intel/18.0.5.274?

@GeorgeGayno-NOAA
Copy link
Collaborator Author

@GeorgeGayno-NOAA I found a couple issues in the modulefiles for global_cycle and emcsfc (for Hera, didn't check others yet) but after fixing them I ran into another issue (the real issue).

A couple of the version variables in the following modulefiles were incorrect. I fixed them before hitting the next error I'll describe below:

-bash-4.2$ git diff modulefiles/*
diff --git a/modulefiles/fv3gfs/global_cycle.hera.lua b/modulefiles/fv3gfs/global_cycle.hera.lua
index 49fcf0d5..9c515f49 100644
--- a/modulefiles/fv3gfs/global_cycle.hera.lua
+++ b/modulefiles/fv3gfs/global_cycle.hera.lua
@@ -10,7 +10,7 @@ load(pathJoin("hpc", hpc_ver))
 hpc_intel_ver=os.getenv("hpc_intel_ver") or "18.0.5.274"
 load(pathJoin("hpc-intel", hpc_intel_ver))
 
-hpc_intel_ver=os.getenv("hpc_impi_ver") or "2018.0.4"
+hpc_impi_ver=os.getenv("hpc_impi_ver") or "2018.0.4"
 load(pathJoin("hpc-impi", hpc_impi_ver))
 
 ip_ver=os.getenv("ip_ver") or "3.3.3"
diff --git a/modulefiles/modulefile.global_emcsfc_ice_blend.hera.lua b/modulefiles/modulefile.global_emcsfc_ice_blend.hera.lua
index 140c759e..b0574ee0 100644
--- a/modulefiles/modulefile.global_emcsfc_ice_blend.hera.lua
+++ b/modulefiles/modulefile.global_emcsfc_ice_blend.hera.lua
@@ -7,7 +7,7 @@ prepend_path("MODULEPATH", "/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gf
 hpc_ver=os.getenv("hpc_ver") or "1.2.0"
 load(pathJoin("hpc", hpc_ver))
 
-hpc_ver=os.getenv("hpc_intel_ver") or "18.0.5.274"
+hpc_intel_ver=os.getenv("hpc_intel_ver") or "18.0.5.274"
 load(pathJoin("hpc-intel", hpc_intel_ver))

Not sure if you also need to load hpc-impi for the emcsfc builds, I didn't add that yet given the other error I hit.

When attempting to load hpc-intel/18.0.5.274 I get the following error:

+ module load global_cycle.hera
++ /apps/lmod/8.5.2/libexec/lmod bash load global_cycle.hera
Lmod has detected the following error: Cannot load module "hpc-intel/18.0.5.274" without these module(s) loaded:
   intel/18.0.5.274

While processing the following module(s):
    Module fullname       Module Filename
    ---------------       ---------------
    hpc-intel/18.0.5.274  /scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gfsv16/modulefiles/core/hpc-intel/18.0.5.274.lua
    global_cycle.hera     /scratch1/NCEPDEV/global/Kate.Friedman/git/dev_v16/sorc/ufs_utils.fd/modulefiles/fv3gfs/global_cycle.hera.lua

+ eval false

My guess it the build hits that error and then loads the hpc-intel default (which is 2022). The hpc-intel/18.0.5.274 module shows a prereq/load for intel/18.0.5.274:

-bash-4.2$ module show hpc-intel/18.0.5.274
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
   /scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gfsv16/modulefiles/core/hpc-intel/18.0.5.274.lua:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
help([[]])
family("MetaCompiler")
conflict("hpc-intel")
conflict("hpc-gnu","hpc-gcc")
load("intel/18.0.5.274")
prereq("intel/18.0.5.274")
prepend_path("MODULEPATH","/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gfsv16/modulefiles/compiler/intel/18.0.5.274")
setenv("FC","ifort")
setenv("CC","icc")
setenv("CXX","icpc")
setenv("SERIAL_FC","ifort")
setenv("SERIAL_CC","icc")
setenv("SERIAL_CXX","icpc")
whatis("Name: hpc-intel")
whatis("Version: 18.0.5.274")
whatis("Category: Compiler")
whatis("Description: Intel Compiler Family and module access")

@Hang-Lei-NOAA Why would we have to load intel/18.0.5.274 on Hera before being able to load hpc-intel/18.0.5.274?

@KateFriedman-NOAA I fixed the environment variable names for the Hera build. See 2f19a61. But I am not getting the build errors you are.

@KateFriedman-NOAA
Copy link
Collaborator

I fixed the environment variable names for the Hera build. See 2f19a61. But I am not getting the build errors you are.

Thanks @GeorgeGayno-NOAA , I updated my copy to that new hash and opened a fresh Hera terminal to try the build again. It built without issue and emcsfc used 2018 intel. Please see these logs to confirm:

/scratch1/NCEPDEV/global/Kate.Friedman/git/dev_v16/sorc/ufs_utils.fd/sorc/logs/*

So when I open a new Hera terminal I get intel/18.0.5.274 and impi/2018.0.4 loaded by default (I'm not loading them in my .bashrc). With a fresh terminal the prereqs for the intel 2018 module is met and it loads the stack ones without issue at build-time. I had done a module purge to test the build cleanly but apparently my environment was too clean(?).

Will test with this hash and report any issues. Thanks George!

@GeorgeGayno-NOAA
Copy link
Collaborator Author

I fixed the environment variable names for the Hera build. See 2f19a61. But I am not getting the build errors you are.

Thanks @GeorgeGayno-NOAA , I updated my copy to that new hash and opened a fresh Hera terminal to try the build again. It built without issue and emcsfc used 2018 intel. Please see these logs to confirm:

/scratch1/NCEPDEV/global/Kate.Friedman/git/dev_v16/sorc/ufs_utils.fd/sorc/logs/*

So when I open a new Hera terminal I get intel/18.0.5.274 and impi/2018.0.4 loaded by default (I'm not loading them in my .bashrc). With a fresh terminal the prereqs for the intel 2018 module is met and it loads the stack ones without issue at build-time. I had done a module purge to test the build cleanly but apparently my environment was too clean(?).

Will test with this hash and report any issues. Thanks George!

I also fixed an Orion modulefile. Use fb34823.

@GeorgeGayno-NOAA
Copy link
Collaborator Author

I fixed the environment variable names for the Hera build. See 2f19a61. But I am not getting the build errors you are.

Thanks @GeorgeGayno-NOAA , I updated my copy to that new hash and opened a fresh Hera terminal to try the build again. It built without issue and emcsfc used 2018 intel. Please see these logs to confirm:

/scratch1/NCEPDEV/global/Kate.Friedman/git/dev_v16/sorc/ufs_utils.fd/sorc/logs/*

According to the logs, everything built OK.

So when I open a new Hera terminal I get intel/18.0.5.274 and impi/2018.0.4 loaded by default (I'm not loading them in my .bashrc). With a fresh terminal the prereqs for the intel 2018 module is met and it loads the stack ones without issue at build-time. I had done a module purge to test the build cleanly but apparently my environment was too clean(?).

Will test with this hash and report any issues. Thanks George!

@KateFriedman-NOAA
Copy link
Collaborator

Thanks @GeorgeGayno-NOAA ! I updated my Orion copy and built it, see logs here:

/work/noaa/global/kfriedma/git/dev_v16/sorc/ufs_utils.fd/sorc/logs/*

@GeorgeGayno-NOAA
Copy link
Collaborator Author

Thanks @GeorgeGayno-NOAA ! I updated my Orion copy and built it, see logs here:

/work/noaa/global/kfriedma/git/dev_v16/sorc/ufs_utils.fd/sorc/logs/*

Looks good.

@GeorgeGayno-NOAA
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Basic upkeep
Projects
None yet
Development

No branches or pull requests

3 participants