Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add restart capability in HAFS prep job #238

Open
BinLiu-NOAA opened this issue Nov 30, 2023 · 1 comment
Open

Add restart capability in HAFS prep job #238

BinLiu-NOAA opened this issue Nov 30, 2023 · 1 comment
Labels
Bugzilla Operational HAFS bugzilla items enhancement New feature or request

Comments

@BinLiu-NOAA
Copy link
Collaborator

Description

Provide a clear and concise description of the requested feature/capability.

Please add restart capability for the HAFS atm_prep_mvnest job at next upgrade.

The WCOSS standard document requires any model run more than 15 minutes should have restart ability. Currently, the HAFS atm_prep_mvnest job runs avg > 55 mins. Please consider to add restart capability in the HAFS atm_prep_mvnest job.

Proposed solution

How should the new feature/capability be added? If you have thoughts on the implementation strategy, please share them here.

Status (optional)

Do you (or a colleague) plan to work on adding this feature?

Related to (optional)

Directly reference any related issues or PRs in this or other repositories, and describe how they are related. Examples:

  • fixed by hafs-community/hafs/pull/<pr_number>
  • dependent upon ufs-community/ufs-weather-model/pull/<pr_number>
  • associated with noaa-emc/upp/pull/<pr_number>
  • related to hafs-community/GSI/issues/<issue_number>
@BinLiu-NOAA BinLiu-NOAA added enhancement New feature or request Bugzilla Operational HAFS bugzilla items labels Nov 30, 2023
@BinLiu-NOAA
Copy link
Collaborator Author

As for the atm_prep_mvnest job in HAFS application/workflow, it produces high resolution (with moving-nest resolution) geographical and surface climatology data for the entire parent domain. The only job depending upon this job's output is the forecast job. There is a time window of ~70 minutes for this atm_prep_mvnest job to run (from T+3:10 to T+4:20) before it could potentially affect/delay the forecast job's kick off time.

Currently in HAFSv1, this atm_prep_mvnest job uses 1 node (with 18PEs and OMPThreads of 6) and it takes ~50 minutes wallclock time. With the latest HAFSv2 package, we optimized and reduced the wallclock time down to ~40 minutes (still using 1 node, but with 6 PEs while 20 threads).

Based on the HAFSv2 EE2 kick off meeting/conversation/discussion with NCO SPAs, it is agreed that given this job only uses 1 node, and also have a long-time window to run (if the first try failed in the middle of the job, most likely the second retry can still complete in time for the forecast job to kick off on time) before it could potentially affect the HAFS application/workflow forecast job and product delivery time, it is agreed that we can leave this as is for the HAFSv2 upgrades.

Moving forward, we are considering several approaches to speed up this atm_prep_mvnest job. One is to speed up the serial executables by enabling OMP Threading. Another choice is to separate it into a few small jobs. And we can also consider adding the motioned/suggested RESTART capability for this atm_prep_mvnest job. For these, we will work toward in the next HAFS upgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bugzilla Operational HAFS bugzilla items enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant