Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: count determined-system pods as det pods [RM-148] #9148

Merged
merged 3 commits into from
Apr 17, 2024

Conversation

carolinaecalderon
Copy link
Contributor

@carolinaecalderon carolinaecalderon commented Apr 11, 2024

Ticket

RM-148

Description

If a pod has a "determined system", "determined preemption" OR "determined" label, don't count it as a "nonDet" pod. This fixes a bug where the db/master pods were counted twice as det pods & nonDet pods, and the master logs gave a too many pods mapping to node warning.

Test Plan

Spin up your own small CPU cluster (or use mine -- slack me for the IP), and check that there are # of slots - 2 slots left in the "default RM". This means that the db & master pod are being counted just once (correct). If there are fewer slots available than expected, or no slots free, this means the pods are being double counted.
Additionally, try spinning up your own experiment and make sure you don't get the "too many pods" warning in your master logs.

Checklist

  • Changes have been manually QA'd
  • User-facing API changes need the "User-facing API Change" label.
  • Release notes should be added as a separate file under docs/release-notes/.
    See Release Note for details.
  • Licenses should be included for new code which was copied and/or modified from any external code.

@cla-bot cla-bot bot added the cla-signed label Apr 11, 2024
Copy link

netlify bot commented Apr 11, 2024

Deploy Preview for determined-ui ready!

Name Link
🔨 Latest commit 0b027c1
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/661ec191aea42a00083881d5
😎 Deploy Preview https://deploy-preview-9148--determined-ui.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

codecov bot commented Apr 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 45.48%. Comparing base (5541e54) to head (0b027c1).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9148      +/-   ##
==========================================
+ Coverage   45.46%   45.48%   +0.02%     
==========================================
  Files        1197     1197              
  Lines      147556   147558       +2     
  Branches     2438     2437       -1     
==========================================
+ Hits        67092    67123      +31     
+ Misses      80232    80203      -29     
  Partials      232      232              
Flag Coverage Δ
backend 43.75% <100.00%> (+0.06%) ⬆️
harness 64.02% <ø> (ø)
web 35.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
master/internal/rm/kubernetesrm/pods.go 21.23% <100.00%> (+0.16%) ⬆️

... and 5 files with indirect coverage changes

@carolinaecalderon carolinaecalderon marked this pull request as ready for review April 11, 2024 16:06
@carolinaecalderon carolinaecalderon requested a review from a team as a code owner April 11, 2024 16:07
Copy link
Contributor

@amandavialva01 amandavialva01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@stoksc stoksc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor feedback

master/internal/rm/kubernetesrm/pods.go Outdated Show resolved Hide resolved
master/internal/rm/kubernetesrm/pods.go Outdated Show resolved Hide resolved
@stoksc
Copy link
Contributor

stoksc commented Apr 16, 2024

can we add a test for this?

@carolinaecalderon carolinaecalderon merged commit 1cc9cd7 into main Apr 17, 2024
72 of 84 checks passed
@carolinaecalderon carolinaecalderon deleted the carolinac/rm-148 branch April 17, 2024 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants