Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KEP-2400] Node swap updates, GA criterias and clarifications #4701

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

iholder101
Copy link
Contributor

@iholder101 iholder101 commented Jun 6, 2024

  • One-line PR description:
    Add updates, GA criterias and clarifications
  • Other comments:

This PR updates the KEP in the following ways:

Emphasize that this KEP is about basic swap enablement
The original KEP indicated that pod-level swap APIs are out of scope:

- Allocating swap on a per-workload basis with accounting (e.g. pod-level
specification of swap). If desired, this should be designed and implemented
as part of a follow-up KEP. This KEP is a prerequisite for that work. Hence,
swap will be an overcommitted resource in the context of this KEP.

This KEP will be limited in scope to the first two scenarios. The third can be
addressed in a follow-up KEP. The enablement work that is in scope for this KEP
will be necessary to implement the third scenario.

However, the lack of APIs and the implicit nature of the current implementation sometimes brings suggestions to extend the API under this KEP.

This KEP focuses on a basic swap enablement. Follow-up KEPs regarding several topics (e.g. customization, zram/zswap suport, and more) will be introduced in the near future, in which we would be able to design and implement each extension in a focused way.

To ensure we're on the same page, this topic was recently raised in a sig-node meeting. In this meeting there was a very broad consensus that this approach makes sense, especially since the NodeSwap feature is important to many different parties which want it to "just work".

This PR updates the KEP to emphasize this approach.

GA criterias
The PR adds GA criterias, alongside the intent to GA in version 1.32.

Make sure PRR is ready

Updates
Since the last KEP updates, many improvements were made and many concerns were addressed. For example:

  • Memory-backed volumes
  • Added metrics
  • Kubelet Configuration examples
  • more

This PR updates the KEP to reflect these updates.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jun 6, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: iholder101
Once this PR has been reviewed and has the lgtm label, please assign dchen1107 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 6, 2024
@iholder101
Copy link
Contributor Author

@iholder101 iholder101 force-pushed the kep2400/post_beta2 branch 2 times, most recently from 16c4878 to b3f9708 Compare June 9, 2024 09:17
@deads2k
Copy link
Contributor

deads2k commented Jun 10, 2024

Please update https://github.com/kubernetes/enhancements/blob/master/keps/prod-readiness/sig-node/2400.yaml and update missing bits of the PRR questionaire.

@iholder101
Copy link
Contributor Author

Thanks @deads2k!

Please update master/keps/prod-readiness/sig-node/2400.yaml

I see you're the assigned approver for alpha/beta.
Is it OK to also assign you as the approver for GA?

and update missing bits of the PRR questionaire.

Done! PTAL :)

@iholder101 iholder101 force-pushed the kep2400/post_beta2 branch 2 times, most recently from f4af444 to 4124649 Compare June 18, 2024 10:55
@sftim
Copy link
Contributor

sftim commented Jun 24, 2024

/retitle [KEP-2400] Node swap ppdates, GA criterias and clarifications

@k8s-ci-robot k8s-ci-robot changed the title [KEP-2400] Updates, GA criterias and clarifications [KEP-2400] Node swap ppdates, GA criterias and clarifications Jun 24, 2024
@sftim
Copy link
Contributor

sftim commented Jun 24, 2024

D'oh

/retitle [KEP-2400] Node swap updates, GA criterias and clarifications

@k8s-ci-robot k8s-ci-robot changed the title [KEP-2400] Node swap ppdates, GA criterias and clarifications [KEP-2400] Node swap updates, GA criterias and clarifications Jun 24, 2024
Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

Here's a mix of feedback; I hope it is all useful.

keps/sig-node/2400-node-swap/README.md Show resolved Hide resolved
keps/sig-node/2400-node-swap/kep.yaml Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Show resolved Hide resolved
@@ -33,7 +33,7 @@ latest-milestone: "v1.30"
milestone:
alpha: "v1.22"
beta: "v1.30"
stable: "TBD"
stable: "v1.32"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
stable: "v1.32"
#stable: "v1.32"

(AIUI)
We can put in a suggestion, but until we're planning that release, it's not a commitment. If we are planning that release, we should change latest-milestone and stage I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should indeed plan it for next release.

The latest-milestone comments read:

# The most recent milestone for which work toward delivery of this KEP has been
# done. This can be the current (upcoming) milestone, if it is being actively
# worked on.

The milestone's documentation reads:

# The milestone at which this feature was, or is targeted to be, at each stage.

Since we target 1.32, miltstone seemed like the right place. I've added it now to latest-milestone.

What's important to me is that we indeed plan and commmit to releasing in 1.32, which there's a broad consensus of doing. I've even volunteered to not pursue GAing in 1.31 to reduce pressure from the sig-node approvers and ensure we do so in 1.32 with no pressure.

@iholder101 iholder101 force-pushed the kep2400/post_beta2 branch 2 times, most recently from 8e73c9e to 9f8f8b8 Compare July 10, 2024 14:19
* If the kernel version equals or is above 6.4, the tmpfs noswap option is being used when necessary.
* Else, kubelet would try to mount a dummy volume with the tmpfs noswap option to understand whether the option is
backported. If the mount succeeds, the tmpfs noswap option is being used when necessary.
* Else, kubelet would raise a warning log entry about the option not being supported and the possible risk.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it will mount it anyway, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK kubelet will fail to mount if the kernel does not support this feature. So it won’t be mounted in the “correct” way due to a limitation of the kernel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it will mount it anyway, right?

Not sure I understand, mount what?
This "dummy mount" will fail if the tmpfs noswap is not supported. It is basically a method to determine whether the noswap option is supported or not. If it is supported, it'll be used to mount memory-backed volumes. If not, a warning log entry will be raised, and memory-backed volumes will be mounted without the noswap option.

Hope this answers your question, let me know if you need more info

@iholder101
Copy link
Contributor Author

@sftim @deads2k @haircommander @SergeyKanzhelev @mrunalp

Would you kindly have another look at this? Is this anything missing? I believe that most of this PR has already been agreed upon in previous sig-node calls. Thanks!

@jabdoa2
Copy link

jabdoa2 commented Jul 26, 2024

We have been using this feature for a while in production and it generally performs well. We still need to set --experimental-allocatable-ignore-eviction on kubelet to allow setting memory eviction threshold to 0. Otherwise, kubelet might evict pods before swapping them out. With that option it has been pretty solid.

@haircommander
Copy link
Contributor

/milestone 1.32

@k8s-ci-robot
Copy link
Contributor

@haircommander: The provided milestone is not valid for this repository. Milestones in this repository: [v1.25, v1.27, v1.28, v1.29, v1.30, v1.31, v1.32, v1.33, v1.34]

Use /milestone clear to clear the milestone.

In response to this:

/milestone 1.32

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@haircommander
Copy link
Contributor

/milestone v1.32

@k8s-ci-robot k8s-ci-robot added this to the v1.32 milestone Aug 20, 2024
Signed-off-by: Itamar Holder <iholder@redhat.com>
…avior

Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
@iholder101
Copy link
Contributor Author

I wish to update that we agreed [1] that NFD support for swap brings enough debugability for this KEP. We'll definitely circle back to improving this as part of the follow-up API extension KEP. Thanks @kannon92 for bringing swap support to NFD, and thanks @dchen1107 for helping pushing the discussion forward!

With this, I believe that this PR is entirely ready to get in.
@sftim @deads2k @haircommander @SergeyKanzhelev @mrunalp - can you please have another look so we can push this forward? Thanks in advance!

[1] The discussion started at the sig-node meeting, then continued to this comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: Needs Reviewer
Development

Successfully merging this pull request may close these issues.

None yet

9 participants