Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: ensure make unslurmcluster always runs in CI #9420

Merged
merged 1 commit into from
May 24, 2024

Conversation

stoksc
Copy link
Contributor

@stoksc stoksc commented May 24, 2024

Ticket

Description

When cluster creation fails halfway, pkill determined-mast will exit with 1. CircleCI runs these inline script with set -e, so that failure causes make unslurmcluster to not run and leave stray TF resources behind.

Test Plan

Is a test.

Checklist

  • Changes have been manually QA'd
  • User-facing API changes need the "User-facing API Change" label.
  • Release notes should be added as a separate file under docs/release-notes/.
    See Release Note for details.
  • Licenses should be included for new code which was copied and/or modified from any external code.

@stoksc stoksc requested a review from a team as a code owner May 24, 2024 01:40
@stoksc stoksc requested a review from JComins000 May 24, 2024 01:40
@cla-bot cla-bot bot added the cla-signed label May 24, 2024
Copy link

netlify bot commented May 24, 2024

Deploy Preview for determined-ui canceled.

Name Link
🔨 Latest commit 907a7e7
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/664ff075aa57e60007239f2b

Copy link

codecov bot commented May 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 45.81%. Comparing base (ba31f03) to head (907a7e7).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9420      +/-   ##
==========================================
- Coverage   51.24%   45.81%   -5.43%     
==========================================
  Files         746      718      -28     
  Lines      110663   108876    -1787     
  Branches     2778     2778              
==========================================
- Hits        56713    49886    -6827     
- Misses      53776    58816    +5040     
  Partials      174      174              
Flag Coverage Δ
backend 34.23% <ø> (-0.05%) ⬇️
harness 49.12% <ø> (-14.90%) ⬇️
web 44.39% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

see 165 files with indirect coverage changes

@stoksc stoksc force-pushed the stoksc/ensure-unslurmcluster-always-runs branch from 3756b32 to 907a7e7 Compare May 24, 2024 01:42
@stoksc stoksc requested a review from dannysauer May 24, 2024 01:42
@stoksc
Copy link
Contributor Author

stoksc commented May 24, 2024

Added you as a reviewer @dannysauer since I told you I'd do this on a thread yesterday.

@codecov-commenter
Copy link

codecov-commenter commented May 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 48.61%. Comparing base (ba31f03) to head (907a7e7).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9420      +/-   ##
==========================================
- Coverage   51.24%   48.61%   -2.64%     
==========================================
  Files         746     1233     +487     
  Lines      110663   158972   +48309     
  Branches     2778     2778              
==========================================
+ Hits        56713    77283   +20570     
- Misses      53776    81515   +27739     
  Partials      174      174              
Flag Coverage Δ
backend 42.17% <ø> (+7.89%) ⬆️
harness 64.02% <ø> (ø)
web 44.39% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

see 488 files with indirect coverage changes

Copy link
Member

@dannysauer dannysauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still gon' leave resources around if the job is interrupted by a second PR or otherwise canceled (because CircleCI has no finalizer), but this is an improvement either way. :)

@stoksc
Copy link
Contributor Author

stoksc commented May 24, 2024

Yeah, I agree, this probably needs to get added to the terminator script at some point. But this at least fixes the specific issue that caused us to have an explosion of vpcs.

@stoksc stoksc enabled auto-merge (squash) May 24, 2024 01:58
@stoksc stoksc disabled auto-merge May 24, 2024 01:58
@stoksc
Copy link
Contributor Author

stoksc commented May 24, 2024

merging over some unrelated failures (known flakes on main)

@stoksc stoksc merged commit cdd7a82 into main May 24, 2024
75 of 96 checks passed
@stoksc stoksc deleted the stoksc/ensure-unslurmcluster-always-runs branch May 24, 2024 01:59
stoksc added a commit that referenced this pull request May 24, 2024
When cluster creation fails halfway, pkill determined-mast will
exit with 1. CircleCI runs these inline script with set -e, so that
failure causes make unslurmcluster to no run, leaving stray
resources behind.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants