KEP-3066: subsecond probes #3067

mikebrow · 2021-11-30T20:53:23Z

Signed-off-by: Mike Brown brownwm@us.ibm.com

One-line PR description: Adding new KEP to support Subsecond Probes.
Issue link: Sub-second / More granular probes #3066
Other comments: wip

k8s-ci-robot · 2021-11-30T20:53:25Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

sftim

Some late thoughts on the discussion / early feedback on the draft.

sftim · 2021-12-20T17:59:09Z

keps/sig-node/3066-subsecond-probes/README.md

+Allow for 0 second Probe, with setting millisecond.
+
+Cons:
+Loosens validation.


This sounds similar to tv_sec and tv_nsec from POSIX time structs. If so, we could mention that analogy.

sftim · 2021-12-20T17:59:58Z

keps/sig-node/3066-subsecond-probes/README.md

+<!--
+What other approaches did you consider, and why did you rule them out? These do
+not need to be as detailed as the proposal, but should include enough
+information to express the idea and why it was not acceptable.
+-->


We should mention microseconds and nanoseconds too (as these are plausible alternatives to milliseconds).

Allowing microseconds might be useful if we support future healthcheck mechanisms.

sftim · 2021-12-20T18:03:48Z

keps/sig-node/3066-subsecond-probes/README.md

+### v2 api for probe.
+
+I think this means a v2 API for Container therefore Pod.
+This seems too invasive.
+
+All Seconds fields become resource.Quantity instead of int32. This supports subdivision in a single field.


The bigger problem is how to support round-tripping to the (still GA) v1 API. Even if we did a v2 API, we'd need a way to represent the subsecond timeout in the v1 API, right?

(I'm personally all for adopting a v2 Pod and v2 PodTemplate API, but lots of clients would expect to still create v1 Deployments, etc).

sftim · 2021-12-20T18:07:53Z

keps/sig-node/3066-subsecond-probes/README.md

+
+V2 API for existing objects.
+Converting fields from int32 to resource.Quantity.
+Subsecond resolution less than one millisecond.


I'd omit this non-goal; it sounds more like a design choice than a goal we're not pursuing.

sftim · 2021-12-20T18:10:02Z

keps/sig-node/3066-subsecond-probes/README.md

+If the Milliseconds variant of a field is set,
+*use it in preference* to the existing Seconds field,
+and **completely ignore** the value of the existing field.
+This behavior makes it opt-in on setting a non-zero Milliseconds field.


If there is an existing admission control mechanism that enforces constraints on timeoutSeconds, does this (draft) change break any API contract that the admission control mechanism is entitled to assume?

@ehashman recommended in the sig node call that we forgo the alternate fields idea and move to adding a new flag to indicate seconds should be processed as milliseconds.. etc.

great point on we will need to check for admission control impacts..

keps/sig-node/3066-subsecond-probes/README.md

keps/sig-node/3066-subsecond-probes/kep.yaml

keps/sig-node/3066-subsecond-probes/README.md

sftim · 2022-02-14T16:46:54Z

Overall, I think my concerns about these API changes are around changes we might want to make a few years in the future.

If we specify probe timeouts in microseconds, that allows for two things:

future health check mechanisms that are really fast (maybe they establish a connection and then the subsequent checks are super low latency)
admission restrictions or other mechanisms to require a minimum timeout (eg, any microsecond duration > 1000μs).

If we ever have to support second-granularity, millisecond-granularity and microsecond-granularity probes, I hope the API for that won't look as ugly as I fear it might.

The approach I'd still promote is to have an extra field to specify “early failure”: an integer quantity of milliseconds (or microseconds - we should leave room for that). That early failure offset is something that legacy clients can ignore without a big problem and would fully support round-tripping to a future Pod v2 API that unifies these fields.

Using a negative offset makes the distinction between (1s - 995ms) more obvious to a legacy client that's unaware of the new fields. A positive offset (0s + 5ms, or maybe 0s + 5000μs) could get interpreted quite differently.

If we do that, we should absolutely disallow a negative offset ≥ 1.0s seconds. Otherwise we have a challenge around unambiguous round-tripping (eg from a future Pod v2 API back to Pod v1).

sftim · 2022-02-14T16:51:30Z

For periodSeconds, do we want to support sub-second periods? If we do then a similar mechanism to what I proposed above (early failure) would work, but it would need a different name. We can bikeshed that name if needed, and we might want admission controls to enforce a minimum check interval (eg 0.1s - just speculating).

sftim · 2022-02-14T16:59:13Z

keps/sig-node/3066-subsecond-probes/README.md

+    New Field:      int32 PeriodOffsetMilliseconds
+
+    If I want to set to 0.5 seconds, 500 milliseconds,
+    PeriodSeconds <= 10 & PeriodOffsetmilliseconds <= -9500


This is an example of a pair of values that could be ambiguous for round-tripping, forcing future APIs to store and track both pieces of data.

By contrast, if we had:

.... periodSeconds: 1 earlyRecheckOffset: - unit: Millisecond value: 500 ...

and the v1 Pod API doesn't allow ambiguous values, then a future v2 Pod API can store a single field value behind the scenes
(eg: periodMicroseconds = 500000u64)

Another option, still clamped to less than 1s:

... periodSeconds: 0 extraRecheckOffset: - unit: Millisecond value: 500 ...

keps/sig-node/3066-subsecond-probes/README.md

psschwei · 2022-03-04T20:47:18Z

keps/sig-node/3066-subsecond-probes/README.md

+
+All Seconds fields become resource.Quantity instead of int32. This supports subdivision in a single field.
+
+### OffsetMilliseconds


@mikebrow can remove this section from the alternatives since it's the main proposal now

nod let's focus in on the offset milliseconds option since and fix the naming above..

keps/sig-node/3066-subsecond-probes/README.md

johnbelamaric · 2022-03-08T18:16:00Z

keps/sig-node/3066-subsecond-probes/README.md

+Allow for 0 second Probe, with setting millisecond. Similar to what is done with `tv_sec` and `tv_nsec` from POSIX time structs.
+
+Cons:
+Loosens validation.


So this proposal would just add a new field for each existing *Seconds field, like PeriodSeconds and PeriodMilliseconds and then the actual time is the sum? This seems like a way more intuitive and understandable API to me. It's also simpler to enter - just one new field instead of a new struct. And follows the existing *Seconds pattern. I am not seeing the downside? Validation can be done on the total duration as well as the individual fields, can't it?

Clients that don't recognise the new field would be liable to trigger much too early, right? How do we avoid that?

Also, how do we avoid setting a precedent that this riskier approach is the one to take.

Yeah, that's true. The lack of versioning at this granularity is a problem. Of course the "offset" has a similar problem (inaccurate interval from old clients). Here's another (maybe bad) idea:

on write:

if *Milliseconds is set, then set *Seconds = ceil(*Milliseconds/1000)

otherwise, set *Milliseconds = *Seconds * 1000

Then, on read, both will be set, and old clients will have a slightly longer interval.

johnbelamaric · 2022-03-08T18:20:09Z

keps/sig-node/3066-subsecond-probes/README.md

+
+Pros:
+Cons:
+


Another alternative is to use "Duration", like PeriodDuration: 1.5s or PeriodDuration: 1500ms or PeriodDuration: 1s500ms. But AFAIK we don't do that anywhere else, it would be a new type.

The existing API guidelines say

Duration fields must be represented as integer fields with units being part of the field name

so if for some of those options, if we adopt them I imagine that we also need to change the API guidelines.

johnbelamaric · 2022-03-08T18:21:43Z

keps/sig-node/3066-subsecond-probes/README.md

+    ```yaml
+    ...
+    periodSeconds: 1
+    periodSecondsOffset:


I find this very unintuitive and verbose as well.

psschwei · 2022-03-09T20:20:01Z

Subsecond Probe Options

Thanks @sftim and @johnbelamaric for your reviews! Trying to summarize the options in one place (since it's a bit messy in the KEP at the moment), here's four options that have been discussed for configuring a probe to run on a subsecond interval:

(edit: given that this discussion is still ongoing, probably better to use hackmd rather than having to continue to edit a comment in a PR...)

psschwei · 2022-03-09T20:28:03Z

(I've got the above in a hackmd ... if it's easier to edit there vs. doing this in issue comments, let me know and I'll add you both to it)

sftim · 2022-03-10T09:00:34Z

@psschwei I suggested using early*Offset as an extra field in the current API. I definitely wouldn't call that negative offset periodMilliseconds as that's surely liable to confuse people (early in the field name should be a massive clue).
See #3067 (comment) for more details.

I saw my proposal as distinct because it did fully avoid the inability to round-trip with a combined field.

psschwei · 2022-03-10T16:05:12Z

I suggested using early*Offset ...

You're right, let me add it to the hackmd now.

Sorry for the oversight; that's why I thought it would be good to put all the options in a single place...

MadhavJivrajani · 2022-03-14T16:01:04Z

This KEP was read and discussed at this month's session of the KEP reading club, commenting here for discoverability purposes. There is a slack thread on #sig-architecture with questions/comments about this KEP that can be found here, thanks!

k8s-triage-robot · 2022-07-24T21:31:52Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

psschwei · 2022-07-25T18:34:26Z

/remove-lifecycle stale

k8s-ci-robot · 2024-03-27T10:24:23Z

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen

Mark this PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mikebrow · 2024-03-27T14:28:58Z

/reopen

k8s-ci-robot · 2024-03-27T14:29:04Z

@mikebrow: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2024-03-27T14:32:08Z

@mikebrow: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-enhancements-verify	`c1de314`	link	true	`/test pull-enhancements-verify`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

bart0sh · 2024-04-05T11:02:23Z

/remove-lifecycle rotten
/cc @kannon92
/easycla

bart0sh · 2024-04-05T11:03:44Z

@mikebrow Please sign the CLA and fix failing CI tests, thanks.

bart0sh · 2024-04-05T11:04:07Z

/cc

thockin · 2024-06-08T00:27:38Z

Is this KEP still alive?

Nibelheims · 2024-06-25T09:25:22Z

Hi.
I see that a check failed because the README's TOC is outdated. Would the following fix it? If I am not mistaken, external contributions to this PR are not allowed. Thanks.

--- a/keps/sig-node/3066-subsecond-probes/README.md
+++ b/keps/sig-node/3066-subsecond-probes/README.md
@@ -48,10 +48,10 @@ Probe timeouts are limited to seconds and that does NOT work well for clients lo
 - [Implementation History](#implementation-history)
 - [Drawbacks](#drawbacks)
 - [Alternatives](#alternatives)
-  - [<code>early*Offset</code>](#)
-  - [<code>*Duration</code>](#-1)
-  - [A new struct, <code>ProbeOffset</code>](#a-new-struct-)
-  - [Setting the time units in a different field, <code>ReadSecondsAs</code>](#setting-the-time-units-in-a-different-field-)
+  - [<code>early*Offset</code>](#earlyoffset)
+  - [<code>*Duration</code>](#duration)
+  - [A new struct, <code>ProbeOffset</code>](#a-new-struct-probeoffset)
+  - [Setting the time units in a different field, <code>ReadSecondsAs</code>](#setting-the-time-units-in-a-different-field-readsecondsas)
   - [v2 api for probe.](#v2-api-for-probe)
   - [OffsetMilliseconds](#offsetmilliseconds)
   - [Reconcile seconds field to nearest whole second.](#reconcile-seconds-field-to-nearest-whole-second)

mikebrow · 2024-07-01T13:38:06Z

Is this KEP still alive?

was paused for 1.31 will bring it back to life for 1.32!

thockin

I have a feeling this may end up in the same place that fast-restart are approaching -- some node-level "budget" for probes, with an HTB to provide some "fairness", such that the probe's period is a request which will be satisfied when possible, but may be delayed when the node is under contention. E.g. if every pod asks for sub-second probes, we could end up doing hundreds of QPS of traffic just on probing.

thockin · 2024-07-04T23:31:36Z

keps/sig-node/3066-subsecond-probes/README.md

+
+There are a few corner cases around default (effective period) values:
+
+`periodSeconds: 0` and `periodMilliseconds: 500` would be 10.5s / 10500ms (as 0 => 10 for `periodSeconds` via [defaults.go](https://github.com/kubernetes/kubernetes/blob/3b13e9445a3bf86c94781c898f224e6690399178/pkg/apis/core/v1/defaults.go#L213-L215))


I think we should fail in that case, no? Sorry, you can't use MS unless you also specify S. Any existing pod will still be OK, right?

nod will update.. should fail, if one wants mills offset must say from any int except 0, because...

thockin · 2024-07-04T23:33:11Z

keps/sig-node/3066-subsecond-probes/README.md

+
+`periodSeconds: 2` and `periodMilliseconds: -500` would be 1.5s / 1500ms.
+
+`periodSeconds: 1` and `periodMilliseconds: -500` would be 0.5s / 500ms. (*** To reduce un-necessary resource usage, because periodMilliseconds is used to reduce the period to less than a second the offset is only used until success is reached then no longer becoming 1second after success.***)


Can we draw out the state-machine for probes and consider how this changes that, specifically in the context of startup, readiness, and liveness actually being different?

In the past we have added fields to probes that don't apply to all modes and it's gross.

psschwei · 2024-07-12T16:49:16Z

I think a budget makes sense... we had proposed a type of throttling as a mitigation, but should be able to change that to a node-level budget

thockin · 2024-07-12T18:36:33Z

I think it makes sense to have both a sane lower-bound (which can depend on probe type) and an overall budget. I think that if we can describe a "budget" semantic that balances requirements, we can use that same pattern in other places. I think requirements include:

Protect the node and the kubelet from DoS
Bound the resources consumed by probes in a predictable way
Provide some notion of fairness across different requestors (definition of "fair" is TBD, but may include priority, aggressiveness, QoS class, size)
Service requests as vigorously as possible within the constraints

@lauralorenz we had a very similar discussion wrt restarts.

lauralorenz · 2024-08-20T21:11:31Z

Hi @psschwei @mikebrow and any other authors -- yes as @thockin mentions, I'm trending towards needing a budget and something that can admit a restart request that is determined in budget for CrashLoopBackOff (#4603) longer term. The restart I'm trying to budget for would be queuing against kubelet directly as the operation is done inline of the kubelet runtime's SyncPod loop. Yours is a little different in that the restart of interest is queueing into to the probe worker's runtime, or perhaps earlier as to whether to allocate a probe worker in the first place (?). Either way it's still taking up resources from the kubelet component so I think we would use somewhat the same kubelet stats to hand out budget and should work together to define it! Let's hang

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 30, 2021

k8s-ci-robot requested review from dchen1107 and derekwaynecarr November 30, 2021 20:53

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. labels Nov 30, 2021

sftim reviewed Dec 20, 2021

View reviewed changes

psschwei reviewed Jan 31, 2022

View reviewed changes

sftim reviewed Feb 14, 2022

View reviewed changes

psschwei reviewed Mar 4, 2022

View reviewed changes

psschwei reviewed Mar 7, 2022

View reviewed changes

johnbelamaric reviewed Mar 8, 2022

View reviewed changes

mikebrow force-pushed the subsecond-probes branch from 9b71650 to 513660c Compare April 25, 2022 21:23

mikebrow mentioned this pull request Apr 26, 2022

Sub-second / More granular probes #3066

Open

4 tasks

mikebrow mentioned this pull request May 27, 2022

REQUEST: New membership for Paul Schweigert kubernetes/org#3447

Closed

9 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 24, 2022

k8s-ci-robot reopened this Mar 27, 2024

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 30, 2024

soltysh mentioned this pull request Apr 3, 2024

Add soltysh to prod-readiness-approvers #4566

Merged

k8s-ci-robot requested a review from kannon92 April 5, 2024 11:02

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Apr 5, 2024

k8s-ci-robot requested a review from bart0sh April 5, 2024 11:04

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 8, 2024

aojea mentioned this pull request May 30, 2024

KEP-4559 Redesigning Kubelet probes #4558

Open

thockin self-assigned this Jun 8, 2024

thockin reviewed Jul 4, 2024

View reviewed changes


		All Seconds fields become resource.Quantity instead of int32. This supports subdivision in a single field.

		### OffsetMilliseconds


		There are a few corner cases around default (effective period) values:

		`periodSeconds: 0` and `periodMilliseconds: 500` would be 10.5s / 10500ms (as 0 => 10 for `periodSeconds` via [defaults.go](https://github.com/kubernetes/kubernetes/blob/3b13e9445a3bf86c94781c898f224e6690399178/pkg/apis/core/v1/defaults.go#L213-L215))


		`periodSeconds: 2` and `periodMilliseconds: -500` would be 1.5s / 1500ms.

		`periodSeconds: 1` and `periodMilliseconds: -500` would be 0.5s / 500ms. (* To reduce un-necessary resource usage, because periodMilliseconds is used to reduce the period to less than a second the offset is only used until success is reached then no longer becoming 1second after success.*)


		Pros:
		Cons:

KEP-3066: subsecond probes #3067

Are you sure you want to change the base?

KEP-3066: subsecond probes #3067

Conversation

mikebrow commented Nov 30, 2021

k8s-ci-robot commented Nov 30, 2021

sftim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sftim commented Feb 14, 2022 • edited Loading

sftim commented Feb 14, 2022

Choose a reason for hiding this comment

sftim Feb 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

psschwei commented Mar 9, 2022 • edited Loading

Subsecond Probe Options

psschwei commented Mar 9, 2022 • edited Loading

sftim commented Mar 10, 2022

psschwei commented Mar 10, 2022

MadhavJivrajani commented Mar 14, 2022

k8s-triage-robot commented Jul 24, 2022

psschwei commented Jul 25, 2022

k8s-ci-robot commented Mar 27, 2024

mikebrow commented Mar 27, 2024

k8s-ci-robot commented Mar 27, 2024

k8s-ci-robot commented Mar 27, 2024

bart0sh commented Apr 5, 2024

bart0sh commented Apr 5, 2024

bart0sh commented Apr 5, 2024

thockin commented Jun 8, 2024

Nibelheims commented Jun 25, 2024 • edited Loading

mikebrow commented Jul 1, 2024

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

psschwei commented Jul 12, 2024

thockin commented Jul 12, 2024

lauralorenz commented Aug 20, 2024

sftim commented Feb 14, 2022 •

edited

Loading

sftim Feb 14, 2022 •

edited

Loading

psschwei commented Mar 9, 2022 •

edited

Loading

psschwei commented Mar 9, 2022 •

edited

Loading

Nibelheims commented Jun 25, 2024 •

edited

Loading