Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update default scheduler to priority for agentrm #9385

Merged
merged 5 commits into from
May 21, 2024

Conversation

kkunapuli
Copy link
Contributor

@kkunapuli kkunapuli commented May 16, 2024

Ticket

RM-254

Description

For agent resource managers, use the priority scheduler by default.

Test Plan

Start a new instance of Determined, e.g. by running devcluster, with no configuration. On the cluster overview page, the resource pool description should say Scheduler Type Priority.

Checklist

  • Changes have been manually QA'd
  • User-facing API changes need the "User-facing API Change" label.
  • Release notes should be added as a separate file under docs/release-notes/.
    See Release Note for details.
  • Licenses should be included for new code which was copied and/or modified from any external code.

@cla-bot cla-bot bot added the cla-signed label May 16, 2024
@determined-ci determined-ci added the documentation Improvements or additions to documentation label May 16, 2024
@determined-ci determined-ci requested a review from a team May 16, 2024 20:31
Copy link

netlify bot commented May 16, 2024

Deploy Preview for determined-ui canceled.

Name Link
🔨 Latest commit f257aeb
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/664d06fac8c66d00081aa974

Comment on lines -201 to +204
if rmExisted || rpsExisted {
if !oldRMConfig {
// Use configMap if RMs are not defined at all, or if they are defined using the new schema.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mixed feelings on this change. I think it's the right thing to do, but I'm suspicious of unintended consequences.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree on both fronts. i would just run a cluster before and after this change, call det master config for both, and diff the results. then maybe post it here if there is anything concerning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! There's only one line different between the configs with/without this specific change. The "new" / proposed version has the addition of this line:
max_cpu_containers_per_agent: -1

Which looks like it's probably fine to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the -1 sentinel value seems right to me.

Copy link

codecov bot commented May 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 46.03%. Comparing base (31bc08a) to head (b64bbda).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9385      +/-   ##
==========================================
+ Coverage   46.02%   46.03%   +0.01%     
==========================================
  Files        1228     1228              
  Lines      155889   155895       +6     
  Branches     2440     2440              
==========================================
+ Hits        71749    71773      +24     
+ Misses      83949    83931      -18     
  Partials      191      191              
Flag Coverage Δ
backend 41.88% <100.00%> (+0.04%) ⬆️
harness 64.10% <ø> (ø)
web 38.24% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
master/cmd/determined-master/root.go 49.69% <100.00%> (+0.61%) ⬆️
master/internal/config/scheduler_config.go 68.18% <100.00%> (+5.68%) ⬆️

... and 6 files with indirect coverage changes

@kkunapuli kkunapuli changed the title Kunapuli/priority default2 feat: update default scheduler to priority for agentrm May 16, 2024
@kkunapuli kkunapuli marked this pull request as ready for review May 16, 2024 20:52
@kkunapuli kkunapuli requested a review from a team as a code owner May 16, 2024 20:52
@kkunapuli kkunapuli requested review from amandavialva01, a team and ShreyaLnuHpe and removed request for a team May 16, 2024 20:52
- Job Scheduling: Switching to ``priority`` for the default scheduler. Support for round-robin and
fair share schedulers is discontinued. We recommend using the priority scheduler, as it meets
most scheduling needs and simplifies configuration. Moving a job will require adjusting its
priority; jobs cannot be shifted within the same priority group.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Job Scheduling: The default scheduler is now priority. Support for the round-robin and fair share schedulers has been discontinued. We recommend using the priority scheduler, as it meets
    most scheduling needs and simplifies configuration. To move a job, you will need to adjust its priority; jobs cannot be shifted within the same priority group.

@determined-ci determined-ci requested a review from a team May 21, 2024 14:49
@kkunapuli kkunapuli merged commit 047580c into main May 21, 2024
75 of 93 checks passed
@kkunapuli kkunapuli deleted the kunapuli/priority-default2 branch May 21, 2024 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants