Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: remove round robin scheduler for agentrm #9493

Merged
merged 17 commits into from
Jun 17, 2024

Conversation

kkunapuli
Copy link
Contributor

@kkunapuli kkunapuli commented Jun 10, 2024

Ticket

RM-322

Description

Remove round robin scheduler for agent resource managers. Deprecation announced in 0.33.0.

Test Plan

None needed.

Checklist

  • Changes have been manually QA'd
  • User-facing API changes need the "User-facing API Change" label.
  • Release notes should be added as a separate file under docs/release-notes/.
    See Release Note for details.
  • Licenses should be included for new code which was copied and/or modified from any external code.

@cla-bot cla-bot bot added the cla-signed label Jun 10, 2024
@determined-ci determined-ci added the documentation Improvements or additions to documentation label Jun 10, 2024
@determined-ci determined-ci requested a review from a team June 10, 2024 18:15
Copy link

netlify bot commented Jun 10, 2024

Deploy Preview for determined-ui canceled.

Name Link
🔨 Latest commit 1f3c2bf
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/666b3ea19aa75300080491f6

@@ -428,7 +428,7 @@ Here is an example master configuration illustrating the potential problem.
resource_manager:
type: agent
scheduler:
type: round_robin
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updating example to only use schedulers that still exist

Copy link

codecov bot commented Jun 10, 2024

Codecov Report

Attention: Patch coverage is 95.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 48.99%. Comparing base (de03909) to head (1f3c2bf).
Report is 16 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #9493   +/-   ##
=======================================
  Coverage   48.99%   48.99%           
=======================================
  Files        1235     1234    -1     
  Lines      160191   160153   -38     
  Branches     2780     2781    +1     
=======================================
- Hits        78482    78474    -8     
+ Misses      81534    81504   -30     
  Partials      175      175           
Flag Coverage Δ
backend 43.83% <95.00%> (+0.01%) ⬆️
harness 63.96% <ø> (ø)
web 44.12% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
master/internal/config/config.go 71.37% <100.00%> (+0.72%) ⬆️
master/internal/config/scheduler_config.go 84.09% <100.00%> (+15.90%) ⬆️
master/internal/rm/agentrm/agents.go 63.30% <ø> (ø)
master/internal/rm/agentrm/scheduler.go 83.33% <100.00%> (+27.77%) ⬆️
...ster/internal/rm/agentrm/agent_resource_manager.go 49.22% <75.00%> (+0.11%) ⬆️

... and 2 files with indirect coverage changes

@kkunapuli kkunapuli force-pushed the kunapuli/remove-round-robin branch from f76b2de to e5d29d0 Compare June 11, 2024 18:50
@kkunapuli kkunapuli changed the title Kunapuli/remove round robin feat: remove round robin scheduler for agentrm Jun 11, 2024
@kkunapuli kkunapuli marked this pull request as ready for review June 12, 2024 16:19
@kkunapuli kkunapuli requested review from a team as code owners June 12, 2024 16:19
Copy link
Contributor

@ShreyaLnuHpe ShreyaLnuHpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@tara-det-ai tara-det-ai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@determined-ci determined-ci requested a review from a team June 13, 2024 18:19
Copy link
Contributor

@NicholasBlaskey NicholasBlaskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one comment but other that, looks good to me

Comment on lines -331 to +342
if r.ResourceManager.AgentRM != nil && r.ResourceManager.AgentRM.Scheduler == nil {
r.ResourceManager.AgentRM.Scheduler = DefaultSchedulerConfig()
if r.ResourceManager.AgentRM != nil {
if r.ResourceManager.AgentRM.Scheduler == nil {
r.ResourceManager.AgentRM.Scheduler = DefaultSchedulerConfig()
}
if r.ResourceManager.AgentRM.Scheduler.GetType() == FairShareScheduling {
log.Warn("Fair-Share Scheduler has been deprecated, please update master config to use Priority Scheduler.")
}
if r.ResourceManager.AgentRM.Scheduler.GetType() == RoundRobinScheduling {
log.Error("Round Robin Scheduler has been removed, please update master config to use Priority Scheduler.")
log.Info("Priority Scheduler with all priorities equal will have the same behavior as a Round Robin Scheduler.")
return fmt.Errorf("scheduler not available")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes in MakeScheduler should take care of this right?

I think this would log two warnings if you specified fairshare in the resource manager level

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in MakeScheduler do take care of it, for the most part.

If the resource manger is configured to use round_robin but all resource pools are configured to use other schedulers (like below), then MakeScheduler will not complain. Do we care about that (admittedly unlikely) case?

        resource_manager:
          type: agent
          scheduler:
            type: round_robin
            fitting_policy: best
        resource_pools:
        - pool_name: pool1
          scheduler:
            type: fair_share
            fitting_policy: best

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is probably fine to not handle that case.

Though if we do want to handle that case it would be nice to do it without logging more than expected warnings.

@@ -205,6 +205,7 @@ func (a *agents) createAgent(

var poolConfig *config.ResourcePoolConfig
for _, pc := range a.poolConfigs {
// The address of a loop variable is always the same. Use a temporary variable to capture the address.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is no longer the case in Go 1.22

https://go.dev/blog/loopvar-preview

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make -C master check complained when I removed pc := pc. I can remove the comment, though.

Copy link
Contributor

@NicholasBlaskey NicholasBlaskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think changes are fine as is

Comment on lines -331 to +342
if r.ResourceManager.AgentRM != nil && r.ResourceManager.AgentRM.Scheduler == nil {
r.ResourceManager.AgentRM.Scheduler = DefaultSchedulerConfig()
if r.ResourceManager.AgentRM != nil {
if r.ResourceManager.AgentRM.Scheduler == nil {
r.ResourceManager.AgentRM.Scheduler = DefaultSchedulerConfig()
}
if r.ResourceManager.AgentRM.Scheduler.GetType() == FairShareScheduling {
log.Warn("Fair-Share Scheduler has been deprecated, please update master config to use Priority Scheduler.")
}
if r.ResourceManager.AgentRM.Scheduler.GetType() == RoundRobinScheduling {
log.Error("Round Robin Scheduler has been removed, please update master config to use Priority Scheduler.")
log.Info("Priority Scheduler with all priorities equal will have the same behavior as a Round Robin Scheduler.")
return fmt.Errorf("scheduler not available")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is probably fine to not handle that case.

Though if we do want to handle that case it would be nice to do it without logging more than expected warnings.

@kkunapuli kkunapuli merged commit 10667f1 into main Jun 17, 2024
89 of 102 checks passed
@kkunapuli kkunapuli deleted the kunapuli/remove-round-robin branch June 17, 2024 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants