affinity assistant will schedule onto nodes with insufficient resources to run a pod in the pipeline, it should not #8015

doctorpangloss · 2024-06-04T15:10:56Z

Expected Behavior

Affinity assistant should schedule onto nodes that have enough resources to run all the steps in the pipeline.

This isn't simple to resolve.

Affinity assistant itself cannot have the requests the whole time, because then it will eat up resources.

1.27 and later have mutable pod requests, you could then reduce the requests once the pod is scheduled, but nothing is stopping the node from scheduling more workloads that will later cause the resource "reservation" to be lost.

You could taint a node as a form of lock while you move around resource requests back and forth between affinity assistant and the task's pods. But if the cluster is running something like descheduler, which observes the cluster for taint changes and moves workloads, the consequences can be chaotic.

The underlying issue is detaching volumes from completed pods. In our infrastructure our PVCs can be used anywhere and we do not need parallel task behavior. But the pods in the pipelinerun stay completed and therefore the RWO PVs stay attached.

Actual Behavior

Affinity assistant will sometimes schedule onto a node that does not have sufficient resources to run one of the pods in the pipeline.

Steps to Reproduce the Problem

Create a cluster with 2 nodes, Node 1 with 16Gi of RAM and Node 2 with 32Gi of RAM.
Create a pipelinerun which will eventually try to schedule a pod that requests 20Gi of RAM.
Observe that Affinity assistant will sometimes schedule on Node 1, all else being equal.
Observe the step with the pod that requires 20Gi of RAM will never schedule, which is fatal.

Additional Info

Kubernetes version:

Output of kubectl version:

Client Version: v1.28.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.14+k0s

Tekton Pipeline version:

Output of tkn version or kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'

v0.58.0

The text was updated successfully, but these errors were encountered:

doctorpangloss added the kind/bug Categorizes issue or PR as related to a bug. label Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

affinity assistant will schedule onto nodes with insufficient resources to run a pod in the pipeline, it should not #8015

affinity assistant will schedule onto nodes with insufficient resources to run a pod in the pipeline, it should not #8015

doctorpangloss commented Jun 4, 2024 •

edited

Loading

affinity assistant will schedule onto nodes with insufficient resources to run a pod in the pipeline, it should not #8015

affinity assistant will schedule onto nodes with insufficient resources to run a pod in the pipeline, it should not #8015

Comments

doctorpangloss commented Jun 4, 2024 • edited Loading

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Additional Info

doctorpangloss commented Jun 4, 2024 •

edited

Loading