-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hotfix: Handle UNAVAILABLE rocoto status in Bash CI #2820
Hotfix: Handle UNAVAILABLE rocoto status in Bash CI #2820
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all needs to be married with the scripts that use it in bash and the pipeline and tested before this goes to a PR.
In practice the CI system flagged the PR in those UNKNOWN states for too long as Stalled specifically because they could not advance. I'm not sure yet if this update would change that.
On closer inspection, I noticed the idea of having to deal with UNAVAIBLE (and Rocoto only does this for PBS BTW) is a good one but it is not specifically the cause for the "fix" as it would have continued anyway onto checking for STALL which is still a logically valid check. Why it seems to solve the issue is because it simply adds more time for the system to self-repair in this state. So this extra step to keep checking is indeed helpful and still valid. |
After discussion with @TerrenceMcGuinness-NOAA, I will also add a check for |
@DavidHuber-NOAA what David means is he adding the "extra" wait in these cases of UNKNOWN and UNAVAIBLE to give the the system extra time to self repair with subsequent runs of |
I think they designate different things. UNKNOWN means the scheduler no longer has information on a job. UNAVAILABLE means the scheduler did not respond before the time-out. |
…e_rocoto * origin/develop: Jenkins Pipeline Updates (NOAA-EMC#2815) Add Gaea C5 to CI (NOAA-EMC#2814) Add support for forecast-only runs on AWS (NOAA-EMC#2711) Add fixes to products for when REPLAY IC's are used (NOAA-EMC#2755) Add capability to run forecast in segments (NOAA-EMC#2795)
7e7fbc6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice update, tested it and the side effects are all valid and work within the framework
Description
From time to time, PBS pro cannot return a
qstat
response within a given time limit set byrocoto
(default is 45 seconds). If that happens, then anUNAVAILABLE
status will be returned for the given job. This PR adds checking for this status to allow CI processing to continue.Refs #2755 christopherwharrop/rocoto#110
Type of change
Change characteristics
How has this been tested?
Visual inspection
Checklist