Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow APP to differ between RUNs #2943

Open
wants to merge 40 commits into
base: develop
Choose a base branch
from

Conversation

DavidHuber-NOAA
Copy link
Contributor

@DavidHuber-NOAA DavidHuber-NOAA commented Sep 19, 2024

Description

This enables APP to be specified for each RUN. This also removes the need for a _no_run configuration dictionary and somewhat simplifies the _init_finalize method.

Resolves #2908
Resolves #2956

Type of change

  • New feature (adds functionality)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO
  • Does this change require an update to any submodules? NO

How has this been tested?

  • Generated .xml files on Hera for all available CI tests and verified they were identical to develop.
  • Created an arbitrary APP configuration ({"gdas": "S2S", "gfs": "ATM"})

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing tests pass with my changes

@DavidHuber-NOAA
Copy link
Contributor Author

This is going to require some more leg work as some APP-specific flags are currently read in without specifying a RUN:

base = conf.parse_config('config.base')
self.mode = base['MODE']
if self.mode not in self.VALID_MODES:
raise NotImplementedError(f'{self.mode} is not a valid application mode.\n'
f'Valid application modes are:\n'
f'{", ".join(self.VALID_MODES)}\n')
self.net = base['NET']
self.model_app = base.get('APP', 'ATM')
self.do_atm = base.get('DO_ATM', True)
self.do_wave = base.get('DO_WAVE', False)
self.do_wave_bnd = base.get('DOBNDPNT_WAVE', False)
self.do_ocean = base.get('DO_OCN', False)
self.do_ice = base.get('DO_ICE', False)
self.do_aero = base.get('DO_AERO', False)

This will require reworking the get_task_names methods to key off of RUN-specific flags.

@DavidHuber-NOAA
Copy link
Contributor Author

This PR now correctly generates identical CI XMLs as compared to develop (with the exception of the stage_ic and cleanup jobs which are now run on service partitions). An additional case was tested based on the C96C48_hybatmaerosnowDA case, but the APP was set to ATMA for RUN==gdas and ATM for RUN==gfs. In this case, the gfs tasks in the resultant XML did not include any aero-component jobs.

Marking ready for review.

workflow/applications/applications.py Outdated Show resolved Hide resolved
workflow/applications/gfs_cycled.py Outdated Show resolved Hide resolved
workflow/applications/applications.py Outdated Show resolved Hide resolved
workflow/applications/gfs_cycled.py Outdated Show resolved Hide resolved
workflow/applications/gfs_cycled.py Outdated Show resolved Hide resolved
workflow/applications/gfs_forecast_only.py Outdated Show resolved Hide resolved
workflow/rocoto/gfs_tasks.py Outdated Show resolved Hide resolved
workflow/rocoto/gfs_tasks.py Outdated Show resolved Hide resolved
@emcbot emcbot added the CI-Orion-Failed **Bot use only** CI testing on Orion for this PR has failed label Oct 17, 2024
@emcbot
Copy link

emcbot commented Oct 17, 2024

Experiment C96C48_hybatmDA FAILED on Orion in Build# 1 in
/work2/noaa/stmp/CI/ORION/2943/RUNTESTS/EXPDIR/C96C48_hybatmDA_1975b2f3

@emcbot
Copy link

emcbot commented Oct 17, 2024

Experiment C96_atm3DVar FAILED on Orion in Build# 1 in
/work2/noaa/stmp/CI/ORION/2943/RUNTESTS/EXPDIR/C96_atm3DVar_1975b2f3

@TerrenceMcGuinness-NOAA
Copy link
Collaborator

TerrenceMcGuinness-NOAA commented Oct 17, 2024

mterry (orion-login-1) C96C48_hybatmDA_1975b2f3 $ pwd
/work2/noaa/stmp/CI/ORION/2943/RUNTESTS/EXPDIR/C96C48_hybatmDA_1975b2f3
mterry (orion-login-1) C96C48_hybatmDA_1975b2f3 $ rocotostat -d C96C48_hybatmDA_1975b2f3.db -w C96C48_hybatmDA_1975b2f3.xml -s
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202112201800        Done    Oct 17 2024 13:28:51    Oct 17 2024 14:30:36
202112210000      Active    Oct 17 2024 13:28:51             -          
202112210600      Active    Oct 17 2024 13:28:51             -          
mterry (orion-login-1) C96C48_hybatmDA_1975b2f3 $ /work2/noaa/stmp/CI/ORION/2943/gfs/ci/scripts/utils/rocotostat.py -d C96C48_hybatmDA_1975b2f3.db -w C96C48_hybatmDA_1975b2f3.xml
STALLED
mterry (orion-login-1) C96C48_hybatmDA_1975b2f3 $ rocotorun -v 10 -d C96C48_hybatmDA_1975b2f3.db -w C96C48_hybatmDA_1975b2f3.xml
10/17/24 11:04:43 CDT :: C96C48_hybatmDA_1975b2f3.xml :: Submitting gdas_prep using sbatch < /tmp/sbatch.in20241017-1418003-p5s697 with input
{{
#! /bin/sh
#SBATCH --job-name=C96C48_hybatmDA_1975b2f3_gdas_prep_00
#SBATCH --account=nems
#SBATCH --qos=batch
#SBATCH --partition=orion
#SBATCH -t 00:30:00
#SBATCH --nodes=2-2
#SBATCH --tasks-per-node=7
#SBATCH --cpus-per-task=1
#SBATCH --mem=192512
#SBATCH -o /work2/noaa/stmp/CI/ORION/2943/RUNTESTS/COMROOT/C96C48_hybatmDA_1975b2f3/logs/2021122100/gdas_prep.log
#SBATCH --export=NONE
#SBATCH --comment=7ab7068ceccc0ec3f97ebb4cd2c6e87f
export RUN_ENVIR='emc'
export HOMEgfs='/work2/noaa/stmp/CI/ORION/2943/gfs'
export EXPDIR='/work2/noaa/stmp/CI/ORION/2943/RUNTESTS/EXPDIR/C96C48_hybatmDA_1975b2f3'
export NET='gfs'
export RUN='gdas'
export CDATE='2021122100'
export PDY='20211221'
export cyc='00'
export COMROOT='/work2/noaa/stmp/CI/ORION/2943/RUNTESTS/COMROOT'
export DATAROOT='/work/noaa/stmp/mterry/ORION/RUNDIRS/C96C48_hybatmDA_1975b2f3/gdas.2021122100'
/work2/noaa/stmp/CI/ORION/2943/gfs/jobs/rocoto/prep.sh
}}
10/17/24 11:04:43 CDT :: C96C48_hybatmDA_1975b2f3.xml :: WARNING: job submission failed: sbatch: error: Memory specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available
10/17/24 11:04:43 CDT :: C96C48_hybatmDA_1975b2f3.xml :: Submitting gfs_prep using sbatch < /tmp/sbatch.in20241017-1418003-pcnt7j with input
{{
#! /bin/sh
#SBATCH --job-name=C96C48_hybatmDA_1975b2f3_gfs_prep_00
#SBATCH --account=nems
#SBATCH --qos=batch
#SBATCH --partition=orion
#SBATCH -t 00:30:00
#SBATCH --nodes=2-2
#SBATCH --tasks-per-node=7
#SBATCH --cpus-per-task=1
#SBATCH --mem=192512
#SBATCH -o /work2/noaa/stmp/CI/ORION/2943/RUNTESTS/COMROOT/C96C48_hybatmDA_1975b2f3/logs/2021122100/gfs_prep.log
#SBATCH --export=NONE
#SBATCH --comment=161689d84fda74888fcee09c8a75ab8c
export RUN_ENVIR='emc'
export HOMEgfs='/work2/noaa/stmp/CI/ORION/2943/gfs'
export EXPDIR='/work2/noaa/stmp/CI/ORION/2943/RUNTESTS/EXPDIR/C96C48_hybatmDA_1975b2f3'
export NET='gfs'
export RUN='gfs'
export CDATE='2021122100'
export PDY='20211221'
export cyc='00'
export COMROOT='/work2/noaa/stmp/CI/ORION/2943/RUNTESTS/COMROOT'
export DATAROOT='/work/noaa/stmp/mterry/ORION/RUNDIRS/C96C48_hybatmDA_1975b2f3/gfs.2021122100'
/work2/noaa/stmp/CI/ORION/2943/gfs/jobs/rocoto/prep.sh
}}
10/17/24 11:04:43 CDT :: C96C48_hybatmDA_1975b2f3.xml :: WARNING: job submission failed: sbatch: error: Memory specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available
10/17/24 11:04:43 CDT :: C96C48_hybatmDA_1975b2f3.xml :: Submission of gdas_prep failed!  sbatch: error: Memory specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available
10/17/24 11:04:43 CDT :: C96C48_hybatmDA_1975b2f3.xml :: Submission of gfs_prep failed!  sbatch: error: Memory specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available

@DavidHuber-NOAA
Copy link
Contributor Author

Thanks @TerrenceMcGuinness-NOAA. I guess I need to readjust mem_node_max for Orion.

@emcbot
Copy link

emcbot commented Oct 17, 2024

Experiment C48_ATM FAILED on Orion in Build# 1 with error logs:

/work2/noaa/stmp/CI/ORION/2943/RUNTESTS/COMROOT/C48_ATM_1975b2f3/logs/2021032312/gfs_metpg2o1.log

Follow link here to view the contents of the above file(s): (link)

@emcbot
Copy link

emcbot commented Oct 17, 2024

Experiment C48_ATM FAILED on Orion in Build# 1 in
/work2/noaa/stmp/CI/ORION/2943/RUNTESTS/EXPDIR/C48_ATM_1975b2f3

@emcbot
Copy link

emcbot commented Oct 17, 2024

Experiment C96_S2SWA_gefs_replay_ics FAILED on Orion in Build# 1 in
/work2/noaa/stmp/CI/ORION/2943/RUNTESTS/EXPDIR/C96_S2SWA_gefs_replay_ics_1975b2f3

@emcbot emcbot added CI-Orion-Failed **Bot use only** CI testing on Orion for this PR has failed and removed CI-Orion-Failed **Bot use only** CI testing on Orion for this PR has failed labels Oct 17, 2024
@emcbot
Copy link

emcbot commented Oct 17, 2024

CI Failed on Orion in Build# 1
Built and ran in directory /work2/noaa/stmp/CI/ORION/2943


Experiment C96C48_hybatmDA_1975b2f3 Terminated with 0 tasks failed and 0 dead at Thu Oct 17 09:31:44 AM CDT 2024
Experiment C96C48_hybatmDA_1975b2f3 Terminated: *STALLED*
Experiment C96_atm3DVar_1975b2f3 Terminated with 0 tasks failed and 0 dead at Thu Oct 17 09:37:53 AM CDT 2024
Experiment C96_atm3DVar_1975b2f3 Terminated: *STALLED*
Experiment C48_ATM_1975b2f3 Terminated with 0
FAIL
FAIL tasks failed and 1 dead at Thu Oct 17 11:28:20 AM CDT 2024
Experiment C48_ATM_1975b2f3 Terminated: *FAIL*
Error logs:
/work2/noaa/stmp/CI/ORION/2943/RUNTESTS/COMROOT/C48_ATM_1975b2f3/logs/2021032312/gfs_metpg2o1.log
Experiment C96_S2SWA_gefs_replay_ics_1975b2f3 Terminated with 0 tasks failed and 0 dead at Thu Oct 17 11:34:29 AM CDT 2024
Experiment C96_S2SWA_gefs_replay_ics_1975b2f3 Terminated: *STALLED*
Experiment C48_S2SW_1975b2f3 Completed 1 Cycles: *SUCCESS* at Thu Oct 17 01:45:14 PM CDT 2024
Experiment C48_S2SWA_gefs_1975b2f3 Completed 1 Cycles: *SUCCESS* at Thu Oct 17 02:46:52 PM CDT 2024

@DavidHuber-NOAA DavidHuber-NOAA removed the CI-Orion-Failed **Bot use only** CI testing on Orion for this PR has failed label Oct 18, 2024
@DavidHuber-NOAA
Copy link
Contributor Author

I'm having some issues with this branch after merging the interval PR. Converting to draft until I get it squared away.

@DavidHuber-NOAA DavidHuber-NOAA marked this pull request as draft October 23, 2024 11:53
@DavidHuber-NOAA DavidHuber-NOAA added the CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress label Oct 23, 2024
@DavidHuber-NOAA DavidHuber-NOAA removed the CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress label Oct 24, 2024
@DavidHuber-NOAA
Copy link
Contributor Author

All features are working again. Reopening this PR for review.

@DavidHuber-NOAA DavidHuber-NOAA marked this pull request as ready for review October 24, 2024 19:17
@DavidHuber-NOAA DavidHuber-NOAA added CI-Hercules-Ready **CM use only** PR is ready for CI testing on Hercules and removed CI-Hercules-Passed **Bot use only** CI testing on Hercules for this PR has completed successfully CI-Hercules-Ready **CM use only** PR is ready for CI testing on Hercules labels Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check reservable memory on Hercules and Orion Enable RUN-specific APP configurations
4 participants