Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow partial admission of jobs #420

Closed
3 tasks done
Tracked by #636
ahg-g opened this issue Oct 18, 2022 · 19 comments · Fixed by #763 or #771
Closed
3 tasks done
Tracked by #636

Allow partial admission of jobs #420

ahg-g opened this issue Oct 18, 2022 · 19 comments · Fixed by #763 or #771
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@ahg-g
Copy link
Contributor

ahg-g commented Oct 18, 2022

What would you like to be added:

A mode of operation that allows partial admission of jobs, for example if parallelism is x and there is only quota available for y < x, then jobs should still be admitted.

In this mode, the job needs to specify some minimum number of pods that is less than parallelism. To force the job controller to start only what was allowed, we need to reset parallelism to match the partial assignment, we can do that in the same update request that unsuspends the job

Why is this needed:

Not all jobs require all-or-nothing.

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

@ahg-g ahg-g added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 18, 2022
@kannon92
Copy link
Contributor

kannon92 commented Jan 3, 2023

I'm interested in tackling this feature. If there is no object, I will assign it to me and I will attempt a design doc first.

/assign @kannon92

@kannon92
Copy link
Contributor

kannon92 commented Jan 3, 2023

A mode of operation that allows partial admission of jobs, for example if parallelism is x and there is only quota available for y < x, then jobs should still be admitted.

So I am reading this as we probably want some kueue configuration option to allow this globally?

I know that we probably want to implement this without "hard-coding" anything on the Job API as we want Kueue to be a general solution for workloads other than the Job API. So I am a bit confused on where this item should live? A user will submit a Job with a CQ annotation so I don't think it should be an API field on the Job API.

Should a user opt-in into this via a CQ setting? Or do we want a global setting in the Kueue Configuration that allows this behavior if an admin enables it?

@alculquicondor
Copy link
Contributor

So I am reading this as we probably want some kueue configuration option to allow this globally?

No, the each workload should dictate whether they can handle lower parallelism.

It probably needs to be an annotation for now.

I think we might be able to justify a field upstream, if we offer some behavior for it. For example, you could have the job controller stop retrying pod creations if it has reached the minParallelism and the last pod creation failed because of lack of ResourceQuota. But I'm just thinking out-loud.

@kannon92
Copy link
Contributor

kannon92 commented Jan 9, 2023

I'm afraid I may not have the time to tackle this in the short term. /unassign @kannon92

@kannon92 kannon92 removed their assignment Jan 9, 2023
@alculquicondor
Copy link
Contributor

MPIJob has a parameter like this, although not properly supported kubeflow/mpi-operator#518

@ahaysx
Copy link

ahaysx commented Feb 17, 2023

A few questions

  1. Should a min-parallelism annotation be hooked into the all-or-nothing scheduling feature? For example, if a Kueue has waitForPodsReady enabled, should the annotation replace parallelism when determining PodsReady? Or are you just asking to consider this annotation when admitting for quota.
  2. Do you want this feature to be gated at the Kueue level? Or should the annotation always be respected?
  3. By design doc are you looking for a KEP or something more informal.

@kerthcet
Copy link
Contributor

Should a min-parallelism annotation be hooked into the all-or-nothing scheduling feature?

I think it's independent of waitForPodsReady, currently we only admit the job iff there's enough quota for parallelism or completions pods, with the feature enabled, min-parallelism pods is enough. But we may also treat job ready when min-parallelism is reached, so we should also change the codes with waitForPodsReady.

Do you want this feature to be gated at the Kueue level? Or should the annotation always be respected?

I think we can have a flag like waitForPodsReady.

By design doc are you looking for a KEP or something more informal.

Yes, see https://github.com/kubernetes-sigs/kueue/tree/main/keps

Some thoughts, I think we should implant the field to Workload maybe inside the PodSet, to make it awared by Kueue, one more thing is should we also record the original parallelism?

@alculquicondor
Copy link
Contributor

alculquicondor commented Feb 17, 2023

There are two things at play here:

  • How to define and support min-parallelism in an abstract way. This usually implies changes to the Workload API and the admission logic.
  • How to apply the selected parallelism to the (custom) job API. In the case of a kubernetes Job, we probably need to update the parallelism field itself, before setting suspend=false.

Do you want this feature to be gated at the Kueue level? Or should the annotation always be respected?

Since the annotation is opt-in, I don't see why we need a gate.

@tenzen-y
Copy link
Member

tenzen-y commented Mar 22, 2023

Probably, we want to add this feature for the MPIJob with Elastic Training (Elastic Horovod). Also, PytorchJob has Elastic Training mode.

@alculquicondor
Copy link
Contributor

alculquicondor commented Mar 22, 2023

This is simpler than elastic. Here we want to start a job with Y pods (where Y>=X), and keep the number at Y during the life of the job.

@tenzen-y
Copy link
Member

This is simpler than elastic. Here we want to start a job with Y pods (where Y>=X), and keep the number at Y during the life of the job.

Ah, I see. Thanks for clarifying.

@trasc
Copy link
Contributor

trasc commented May 10, 2023

/assign

@trasc
Copy link
Contributor

trasc commented May 10, 2023

Do we need a KEP for this?

The way I see it:

After #756 we will have a way of knowing the number of pods for which the admission was done

https://github.com/epam/kubernetes-kueue/blob/0d840050ad4a09f36b00db2988552955db46898b/apis/kueue/v1beta1/workload_types.go#LL67C1-L84C2

What's still needed:

  • core
    • a way to mark that a podSet accepts partial admission (likely API change)
    • if the case, try the flavor assigment with lower pod count
    • prepare the job for partial execution, this should be done similary to the way node selectors are handled (likely with restore mechanism)
  • batch/job
    • a way to mark that a job supports partial admission (likely annotation)
    • implement set/restore parallelism for partial admission
    • change the job vs workload equivalent function to account for partial admission

@tenzen-y
Copy link
Member

tenzen-y commented May 10, 2023

If we don't need to modify any APIs (adding new APIs), I'm ok with no KEP.
@alculquicondor WDYT?

@alculquicondor
Copy link
Contributor

We need to add new APIs. Otherwise how do users communicate that they are ok with partial admission. And how do we communicate back to the user what was the decision taken?

And the implementation might not be trivial. So a KEP helps put everything in perspective. As the project grows, we might be requiring more KEPs.

@tenzen-y
Copy link
Member

And the implementation might not be trivial. So a KEP helps put everything in perspective. As the project grows, we might be requiring more KEPs.

That's right.

@alculquicondor
Copy link
Contributor

/reopn

#763 was only the high level design

@kerthcet
Copy link
Contributor

/reopen

@k8s-ci-robot
Copy link
Contributor

@kerthcet: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants