Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve calculation of NodeDrainTimeout & NodeVolumeDetachTimeout exceeded #11126

Open
sbueringer opened this issue Sep 2, 2024 · 7 comments · May be fixed by #11166
Open

Improve calculation of NodeDrainTimeout & NodeVolumeDetachTimeout exceeded #11126

sbueringer opened this issue Sep 2, 2024 · 7 comments · May be fixed by #11166
Assignees
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@sbueringer
Copy link
Member

Today, we check if the NodeDrainTimeout & NodeVolumeDetachTimeout are exceeded by comparing time.Now with the LastTransitionTime of the corresponding conditions.

This is fragile for multiple reasons:

It would be already a good improvement if the following is implemented:

existing conditions utils changes lastTranistionTime whenever something changes in the condition, while future/upstream aligned conditions utils are changing lastTranistionTime only when status change
#11033 (comment)

But I think instead of relying on the LastTransitionTime we should have additional status fields tracking when we started to drain & when we started to wait for volume detach and then calculate if the timeouts are exceeded based on that.

(Note: this is not relevant for NodeDeletionTimeout because for this one we use the deletionTimestamp field)

@sbueringer sbueringer added this to the v1.9 milestone Sep 2, 2024
@k8s-ci-robot k8s-ci-robot added needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 2, 2024
@sbueringer sbueringer added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Sep 2, 2024
@k8s-ci-robot k8s-ci-robot removed needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Sep 2, 2024
@sbueringer
Copy link
Member Author

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 2, 2024
@sbueringer
Copy link
Member Author

@fabriziopandini
Copy link
Member

fabriziopandini commented Sep 3, 2024

But I think instead of relying on the LastTransitionTime we should have additional status fields tracking when we started to drain & when we started to wait for volume detach and then calculate if the timeouts are exceeded based on that.

+1
I think that we should add a status.Termination struct with NodeDrainStartTime & NodeVolumeDetachStartTime fields. This will be somehow "consistent" with the Initialization struct proposed by #10897.

@chrischdi
Copy link
Member

/assign

@sbueringer
Copy link
Member Author

sbueringer commented Sep 9, 2024

One small note, wondering if we should call it status.deletion instead of termination. I think in CAPI we usually always talk about "deletion"

+ NodeVolumeDetachStartTime should be WaitForNodeVolumeDetachStartTime

@fabriziopandini
Copy link
Member

No strong opinions about the name.
Termination makes a good pair to Initialization, it also reminds of stuff like gracetermination period etc etc

@sbueringer
Copy link
Member Author

sbueringer commented Sep 10, 2024

Yup, we just use deletion in a lot of places and we don't have things like terminationGracePeriod for Machines (but we have a deletionTimestamp)

(we're also always talking about Machine deletion, never really used/heard Machine termination)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
4 participants