Account for terminating pods when doing preemption #510

alculquicondor · 2023-01-11T14:41:04Z

What would you like to be added:

Account for the quota in use by pods that are still terminating when doing preemptions.

Why is this needed:

We issue preemptions by setting Workload.spec.admission=nil and immediately consider these resources freed. But in reality, pods take time to terminate.

We will need to keep the old admission somewhere for the calculations, and bubble up information about running pods from the Job. Maybe this will require improvements to the job controller.

Completion requirements:

This enhancement requires the following artifacts:

Design doc
API change
Docs update

The artifacts should be linked in subsequent comments.

The text was updated successfully, but these errors were encountered:

trasc · 2023-04-06T06:35:15Z

/assign

kannon92 · 2023-04-06T11:48:02Z

Hey @trasc,

@alculquicondor suggested I look into this issue as part of kubernetes/enhancements#3940.

I think it’s good that you are looking into it because I’m a little confused on what Kueue would want for accounting for terminating pods.

alculquicondor · 2023-04-06T12:44:40Z

There are 2 parts to this problem:

In Kueue, we need to preserve the information about how the job was admitted (in which queue and using which flavors).
In Job status, we need the information about how many pods are still terminating.

trasc · 2023-04-06T13:16:12Z

As I see this:

In Kueue, we need to preserve the information about how the job was admitted (in which queue and using which flavors).

#599 will add a new condition "Evicted" set upon preemption, an we can have this set without clearing up the admission (both the condition and the struct) and only reset the admission when job.IsActive() starts returning false, so the workload is considered "active" until we are "sure" that no resources are blocked by it. This should help to avoid over-provisioning.

2. In Job status, we need the information about how many pods are still terminating.

In my opinion this is the job of job.IsActive() and we can think of ways of making the implementation of it more accurate (maybe walk some kind of object ownership tree).

alculquicondor · 2023-04-06T13:33:33Z

In my opinion this is the job of job.IsActive() and we can think of ways of making the implementation of it more accurate (maybe walk some kind of object ownership tree).

Right, we can always implement the logic in Kueue by watching pods. But ideally Kueue doesn't need to know about pods, for separation of concerns.

For the purpose of making progress, we can start with 1. and we can wait to see if we can leverage kubernetes/enhancements#3940

trasc · 2023-04-06T13:34:45Z

@kannon92 in the implementation of IsActive() for batch/job

kueue/pkg/controller/jobs/job/job_controller.go

Lines 125 to 127 in 3113e4b

    
           func (j *Job) IsActive() bool { 
        
           	return j.Status.Active != 0 
        
           }

we are relying on j.Status.Active to determine if a job is blocking any resources.

kannon92 · 2023-04-06T13:42:14Z

Yea so the issue is that Active doesn't include terminating pods. That KEP is about including terminating pods in active but I realize that there were some design decisions around termating pods being considered failed.

If PodFailurePolicy is on, then job will mark a terminating pod as failed once it is fully terminated. If PodFailurePolicy is off, then a job immediately transitions to failed but the pod still has resources.

I think that we will go with what @alculquicondor has suggested and probably have a status field for terminating so we can catch this intermittent state without changing the behavior.

https://github.com/kubernetes/enhancements/blob/1a9513382b0338026a2524baa4159951f66924b0/keps/sig-apps/3939-include-terminating-pods-as-active/README.md#open-questions-on-job-controller

Does this mean that if you want this feature for other workloads (MPI, etc) then they should include terminating pods in their status?

alculquicondor · 2023-04-06T13:50:50Z

Let's leave job conversations to k/k :)

@trasc, let start by keeping the .status.admission field when evicting/preempting and just updating the Admitted condition.
We can add the Evicted condition in a follow up.

alculquicondor · 2023-04-06T13:59:10Z

Synced with @trasc offline to understand his suggestion better.

So the idea is to add the Evicted condition without changing the Admitted condition or clearing the .status.admission field. Once we are "certain" that the job doesn't have running pods, we actually clear .status.admission and update the Admitted condition.

This works better and is backwards-compatible.

alculquicondor added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 11, 2023

This was referenced Mar 15, 2023

☂️ Requirements for v0.4 #636

Closed

Swap admitted condition in the same API call as admission or eviction #654

Merged

alculquicondor mentioned this issue Apr 3, 2023

Jobs create replacement Pods as soon as a Pod is marked for deletion kubernetes/kubernetes#115844

Closed

k8s-ci-robot assigned trasc Apr 6, 2023

This was referenced Apr 7, 2023

[workload] WaitForPodsReady: Requeue at the back of the queue after timeout #689

Merged

Keep evicted workloads in admitted while the associated jobs are still active #692

Merged

k8s-ci-robot closed this as completed in #692 May 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Account for terminating pods when doing preemption #510

Account for terminating pods when doing preemption #510

alculquicondor commented Jan 11, 2023

trasc commented Apr 6, 2023

kannon92 commented Apr 6, 2023

alculquicondor commented Apr 6, 2023

trasc commented Apr 6, 2023 •

edited

Loading

alculquicondor commented Apr 6, 2023

trasc commented Apr 6, 2023

kannon92 commented Apr 6, 2023

alculquicondor commented Apr 6, 2023

alculquicondor commented Apr 6, 2023

Account for terminating pods when doing preemption #510

Account for terminating pods when doing preemption #510

Comments

alculquicondor commented Jan 11, 2023

trasc commented Apr 6, 2023

kannon92 commented Apr 6, 2023

alculquicondor commented Apr 6, 2023

trasc commented Apr 6, 2023 • edited Loading

alculquicondor commented Apr 6, 2023

trasc commented Apr 6, 2023

kannon92 commented Apr 6, 2023

alculquicondor commented Apr 6, 2023

alculquicondor commented Apr 6, 2023

trasc commented Apr 6, 2023 •

edited

Loading