Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GEFS C48 support on AWS #2818

Merged
merged 80 commits into from
Aug 26, 2024

Conversation

weihuang-jedi
Copy link
Contributor

Changes to make GEFS C48 case run on AWS.

Description

After C48 ATM forecast only runs on AWs, the next step is to make GEFS C48 run on AWS.
Changes to AWS env, and yaml files.

Resolves #2817
Refs #2711

Type of change

  • New feature (adds functionality)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? YES
  • Does this change require an update to any of the following submodules? NO (If YES, please add a link to any PRs that are pending.)
    • EMC verif-global
    • GDAS
    • GFS-utils
    • GSI
    • GSI-monitor
    • GSI-utils
    • UFS-utils
    • UFS-weather-model
    • wxflow

How has this been tested?

  • Clone and build on AWS
  • GEFS C48 run on AWS

Checklist

  • [x ] Any dependent changes have been merged and published
  • [ x] My code follows the style guidelines of this project
  • [ x] I have performed a self-review of my own code
  • [x ] I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • I have made corresponding changes to the documentation if necessary

weihuang-jedi and others added 30 commits June 18, 2024 23:05
@weihuang-jedi weihuang-jedi marked this pull request as ready for review August 13, 2024 22:25
parm/config/gefs/config.resources.AWSPW Outdated Show resolved Hide resolved
scripts/exgfs_wave_post_pnt.sh Outdated Show resolved Hide resolved
sorc/verif-global.fd Outdated Show resolved Hide resolved
workflow/rocoto/rocoto.py Outdated Show resolved Hide resolved
@WalterKolczynski-NOAA WalterKolczynski-NOAA changed the title Run GEFS C48 on AWS Add GEFS C48 support on AWS Aug 14, 2024
env/AWSPW.env Outdated Show resolved Hide resolved
@weihuang-jedi
Copy link
Contributor Author

GEFS runs have wave cases failed as:
202103231200 wave_post_pnt_mem000 570 DEAD 32512 9 83.0
202103231200 wave_post_pnt_mem001 571 DEAD 32512 9 83.0
202103231200 wave_post_pnt_mem002 572 DEAD 32512 9 83.0
202103231200 arch - - - - -

Log error msg as:

  • exgfs_wave_post_pnt.sh[467]: wavenproc=1210
    ++ exgfs_wave_post_pnt.sh[468]: echo 200

  • exgfs_wave_post_pnt.sh[468]: wavenproc=200

  • exgfs_wave_post_pnt.sh[470]: set +x

    Executing the wave point scripts at : Thu Aug 15 20:50:15 UTC 2024

  • exgfs_wave_post_pnt.sh[477]: '[' 200 -gt 1 ']'

  • exgfs_wave_post_pnt.sh[479]: '[' YES = YES ']'

  • exgfs_wave_post_pnt.sh[480]: srun -l --export=ALL -n 200 --multi-prog --output=mpmd.%j.%t.out cmdmprog
    srun: Warning: can't honor --ntasks-per-node set to 36 which doesn't match the requested tasks 200 with the number of requested nodes 6. Ignoring --ntasks-per-node.
    srun: error: weihuang-epicweiaws-00120-1-0013: tasks 0-33: Exited with exit code 127
    srun: error: weihuang-epicweiaws-00120-1-0014: tasks 34-67: Exited with exit code 127
    srun: error: weihuang-epicweiaws-00120-1-0016: tasks 101-133: Exited with exit code 127
    srun: error: weihuang-epicweiaws-00120-1-0017: tasks 134-166: Exited with exit code 127
    srun: error: weihuang-epicweiaws-00120-1-0015: tasks 68-100: Exited with exit code 127
    srun: error: weihuang-epicweiaws-00120-1-0018: tasks 167-199: Exited with exit code 127

  • exgfs_wave_post_pnt.sh[1]: postamble exgfs_wave_post_pnt.sh 1723754945 127

  • preamble.sh[70]: set +x
    End exgfs_wave_post_pnt.sh at 20:50:20 with error code 127 (time elapsed: 00:01:15)

  • JGLOBAL_WAVE_POST_PNT[1]: postamble JGLOBAL_WAVE_POST_PNT 1723754943 127

  • preamble.sh[70]: set +x
    End JGLOBAL_WAVE_POST_PNT at 20:50:20 with error code 127 (time elapsed: 00:01:17)

  • wavepostpnt.sh[1]: postamble wavepostpnt.sh 1723754937 127

  • preamble.sh[70]: set +x
    End wavepostpnt.sh at 20:50:20 with error code 127 (time elapsed: 00:01:23)

Any suggestions?

@WalterKolczynski-NOAA WalterKolczynski-NOAA merged commit 1231c9a into NOAA-EMC:develop Aug 26, 2024
5 checks passed
DavidHuber-NOAA added a commit to DavidHuber-NOAA/global-workflow that referenced this pull request Aug 28, 2024
* origin/develop:
  Fix gdas build on Gaea and add Gaea to available CI list (NOAA-EMC#2857)
  Support ATM forecast only on Google (NOAA-EMC#2832)
  Add GEFS C48 support on AWS (NOAA-EMC#2818)
  Update omega calculation (NOAA-EMC#2751)
  Add snow DA update and recentering for the EnKF forecasts (NOAA-EMC#2690)
DavidHuber-NOAA added a commit to DavidHuber-NOAA/global-workflow that referenced this pull request Sep 9, 2024
* origin/develop:
  Create JEDI class (NOAA-EMC#2805)
  Restructure the bufr sounding job    (NOAA-EMC#2853)
  Add an archive task to GEFS system to archive files locally (NOAA-EMC#2816)
  Reenable Orion Cycling Support (NOAA-EMC#2877)
  Eliminate race conditions and remove DATAROOT last in cleanup (NOAA-EMC#2893)
  Update aerosol climatology to 2013-2024 mean (NOAA-EMC#2888)
  Add ability to run CI test C96_atm3DVar.yaml to Gaea-C5 (NOAA-EMC#2885)
  Support global-workflow GEFS C48 on Google Cloud (NOAA-EMC#2861)
  Add 3 and 9 hr increment files to IC staging (NOAA-EMC#2876)
  Add diffusion/diag B for aerosol DA and some other needed changes (NOAA-EMC#2738)
  Correct ocean `MOM.res_#` stage copy (NOAA-EMC#2868)
  Support coupling on AWS (NOAA-EMC#2859)
  Add JEDI ATM lgetkf observer and solver jobs (NOAA-EMC#2833)
  Fix gdas build on Gaea and add Gaea to available CI list (NOAA-EMC#2857)
  Support ATM forecast only on Google (NOAA-EMC#2832)
  Add GEFS C48 support on AWS (NOAA-EMC#2818)
  Update omega calculation (NOAA-EMC#2751)
  Add snow DA update and recentering for the EnKF forecasts (NOAA-EMC#2690)
  support ATM forecast only on Azure (NOAA-EMC#2827)
  Convert staging job to python and yaml (NOAA-EMC#2651)
  Fixed test on UNAVAILBLE in python Rocoto check (NOAA-EMC#2842)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Run global-workflow GEFS C48 on AWS
3 participants