You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Affinity assistant should schedule onto nodes that have enough resources to run all the steps in the pipeline.
This isn't simple to resolve.
Affinity assistant itself cannot have the requests the whole time, because then it will eat up resources.
1.27 and later have mutable pod requests, you could then reduce the requests once the pod is scheduled, but nothing is stopping the node from scheduling more workloads that will later cause the resource "reservation" to be lost.
You could taint a node as a form of lock while you move around resource requests back and forth between affinity assistant and the task's pods. But if the cluster is running something like descheduler, which observes the cluster for taint changes and moves workloads, the consequences can be chaotic.
The underlying issue is detaching volumes from completed pods. In our infrastructure our PVCs can be used anywhere and we do not need parallel task behavior. But the pods in the pipelinerun stay completed and therefore the RWO PVs stay attached.
Actual Behavior
Affinity assistant will sometimes schedule onto a node that does not have sufficient resources to run one of the pods in the pipeline.
Steps to Reproduce the Problem
Create a cluster with 2 nodes, Node 1 with 16Gi of RAM and Node 2 with 32Gi of RAM.
Create a pipelinerun which will eventually try to schedule a pod that requests 20Gi of RAM.
Observe that Affinity assistant will sometimes schedule on Node 1, all else being equal.
Observe the step with the pod that requires 20Gi of RAM will never schedule, which is fatal.
Additional Info
Kubernetes version:
Output of kubectl version:
Client Version: v1.28.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.14+k0s
Tekton Pipeline version:
Output of tkn version or kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'
v0.58.0
The text was updated successfully, but these errors were encountered:
Expected Behavior
Affinity assistant should schedule onto nodes that have enough resources to run all the steps in the pipeline.
This isn't simple to resolve.
Affinity assistant itself cannot have the
requests
the whole time, because then it will eat up resources.1.27 and later have mutable pod requests, you could then reduce the requests once the pod is scheduled, but nothing is stopping the node from scheduling more workloads that will later cause the resource "reservation" to be lost.
You could taint a node as a form of lock while you move around resource requests back and forth between affinity assistant and the task's pods. But if the cluster is running something like descheduler, which observes the cluster for taint changes and moves workloads, the consequences can be chaotic.
The underlying issue is detaching volumes from completed pods. In our infrastructure our PVCs can be used anywhere and we do not need parallel task behavior. But the pods in the pipelinerun stay completed and therefore the RWO PVs stay attached.
Actual Behavior
Affinity assistant will sometimes schedule onto a node that does not have sufficient resources to run one of the pods in the pipeline.
Steps to Reproduce the Problem
Additional Info
Kubernetes version:
Output of
kubectl version
:Tekton Pipeline version:
Output of
tkn version
orkubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'
The text was updated successfully, but these errors were encountered: