Skip to content

Commit

Permalink
feat: update default scheduler to priority for agentrm (#9385)
Browse files Browse the repository at this point in the history
  • Loading branch information
kkunapuli committed May 21, 2024
1 parent ce70c00 commit 047580c
Show file tree
Hide file tree
Showing 5 changed files with 25 additions and 13 deletions.
2 changes: 1 addition & 1 deletion docs/reference/deploy/master-config-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,7 @@ behavior specified here. For more on scheduling behavior in Determined, see :ref
^^^^^^^^

The scheduling policy to use when allocating resources between different tasks (experiments,
notebooks, etc.). Defaults to ``fair_share``.
notebooks, etc.). Defaults to ``priority``.

- ``fair_share``: Tasks receive a proportional amount of the available resources depending on
the resource they require and their weight.
Expand Down
17 changes: 11 additions & 6 deletions docs/release-notes/feature-clean-up.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,18 @@ out-of-the-box with minimal customization required.

Agent Resource Manager:

- Container Runtimes: We will limit support to Docker for Agent Resource Managers.
- Container Runtimes: Due to limited usage, we will limit supported container runtimes to Docker
for the Agent Resource Manager. This does not impact Kubernetes, Slurm or PBS environments.

- Job Scheduling: Moving a job will require adjusting its priority; jobs cannot be shifted within
the same priority group. Support for round-robin and fair share schedulers is discontinued. We
recommend using the priority scheduler, as it meets most scheduling needs and simplifies
configuration.
- Job Scheduling: The default scheduler is now ``priority``. Support for round-robin and fair share
schedulers has been discontinued. We recommend using the priority scheduler, as it meets most
scheduling needs and simplifies configuration. To move a job, you will need to adjust its
priority; jobs cannot be shifted within the same priority group.

- AMD GPUs: Support will continue only for Nvidia GPUs.
- AMD GPUs: Due to limited usage, we will limit supported accelerators to NVIDIA GPUs. If you have
a use case requiring AMD GPU support with the Agent Resource Manager, please reach out to us via
a `GitHub Issue <https://github.com/determined-ai/determined/issues>`__ or `community slack
<https://join.slack.com/t/determined-community/shared_invite/zt-1f4hj60z5-JMHb~wSr2xksLZVBN61g_Q>`__!
This does not impact Kubernetes or Slurm environments.

Machine Architectures: PPC64/POWER builds across all environments are no longer supported.
9 changes: 6 additions & 3 deletions master/cmd/determined-master/root.go
Original file line number Diff line number Diff line change
Expand Up @@ -193,18 +193,21 @@ func applyBackwardsCompatibility(configMap map[string]interface{}) (map[string]i
vProvisioner, provisionerExisted := configMap["provisioner"]

// Ensure we use either the old schema or the new one.
if (rmExisted || rpsExisted) && (schedulerExisted || provisionerExisted) {
oldRMConfig := schedulerExisted || provisionerExisted
newRMConfig := rmExisted || rpsExisted
if newRMConfig && oldRMConfig {
return nil, errors.New(
"cannot use the old and the new configuration schema at the same time",
)
}
if rmExisted || rpsExisted {
if !oldRMConfig {
// Use configMap if RMs are not defined at all, or if they are defined using the new schema.
return configMap, nil
}

// If use the old schema, convert it to the new one.
newScheduler := map[string]interface{}{
"type": "fair_share",
"type": "priority",
"fitting_policy": "best",
}
newRM := map[string]interface{}{
Expand Down
4 changes: 2 additions & 2 deletions master/cmd/determined-master/root_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ func TestApplyBackwardsCompatibility(t *testing.T) {
"type": "agent",
"scheduler": map[string]interface{}{
"fitting_policy": "best",
"type": "fair_share",
"type": "priority",
},
"default_cpu_resource_pool": "default",
"default_gpu_resource_pool": "default",
Expand Down Expand Up @@ -178,7 +178,7 @@ func TestApplyBackwardsCompatibility(t *testing.T) {
"type": "kubernetes",
"scheduler": map[string]interface{}{
"fitting_policy": "best",
"type": "fair_share",
"type": "priority",
},
"master_service_name": "k8s-det",
},
Expand Down
6 changes: 5 additions & 1 deletion master/internal/config/scheduler_config.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,12 @@ const (

// DefaultSchedulerConfig returns the default fair share configuration for the scheduler.
func DefaultSchedulerConfig() *SchedulerConfig {
tmp := DefaultSchedulingPriority
return &SchedulerConfig{
FairShare: &FairShareSchedulerConfig{},
Priority: &PrioritySchedulerConfig{
Preemption: false,
DefaultPriority: &tmp,
},
FittingPolicy: defaultFitPolicy,
}
}
Expand Down

0 comments on commit 047580c

Please sign in to comment.