Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-4742: Expose Node Labels via Downward API #4747

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

docandrew
Copy link

  • Adding new KEP 4742: Expose Node Labels via Downward API

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 1, 2024
@k8s-ci-robot
Copy link
Contributor

Welcome @docandrew!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jul 1, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @docandrew. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: docandrew
Once this PR has been reviewed and has the lgtm label, please assign mrunalp for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jul 1, 2024
@pacoxu
Copy link
Member

pacoxu commented Jul 5, 2024

/cc @kerthcet

@ArangoGutierrez
Copy link

As discussed at kubernetes/kubernetes#40610
this is open a lot of concerns around security

@cmwylie19
Copy link

As discussed at kubernetes/kubernetes#40610 this is open a lot of concerns around security

I think we can do this in a safe way. Let me know if the reply makes sense. Happy to work through this and give examples.

@ArangoGutierrez
Copy link

As discussed at kubernetes/kubernetes#40610 this is open a lot of concerns around security

I think we can do this in a safe way. Let me know if the reply makes sense. Happy to work through this and give examples.

After reading the entire conversation on the KEP issue and related links, I am ok with this now. As NFD maintainer I am eager to help here, as NFD creates labels for the underlying infra

@SergeyKanzhelev
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 9, 2024
@SergeyKanzhelev
Copy link
Member

SergeyKanzhelev commented Jul 9, 2024

/sig auth

need opinion on security and potential abuse

@k8s-ci-robot k8s-ci-robot added the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label Jul 9, 2024
@SergeyKanzhelev
Copy link
Member

some concerns to address: kubernetes/kubernetes#40610 (comment)

keps/sig-node/4742-node-labels-downward/README.md Outdated Show resolved Hide resolved

The initial design includes:

* Being able to pass node labels to volumes with fieldPath of `node.metadata.labels` where only labels containing `topology.k8s.io/` are passed through
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this overloading the namespace of topology.k8s.io? Are we encouraging users to overload it with their own labels (toplogy.k8s.io/hostuefipassword=foo)? Will users accidentally see side effects because they use the same fields inside the topology.k8s.io namespace for another purpose?

Maybe it would be better to either use a config-controlled allow-list within the kubelet(?) config or a limited allow-list of well-known labels plus a new namespace like alpha.exposednodelabels.k8s.io/* (feel free to come up with a better name).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we think "topology.k8s.io" in an extensibility mechanism, we need to spec that very clearly.

e.g. x.topology.k8s.io/<anything> is for extensions, while topology.k8s.io/<anything> is reserved for k8s use.

And even then we need to define "local". Can vendors define topology extensions?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking to update this a KEP a bit, but i think I am a little lost. So we need to strictly document and define through code what the <anything> flag can be? Is this the ask?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flag as in part of the label that would be applied to the pod

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking for the KEP to say SOMETHING LIKE the below. Note that this is a suggested approach, we should consider alternatives:

"""
In KEP 1659, the following labels are defined:
* topology.kubernetes.io/region
* topology.kubernetes.io/zone

In addition to those, KEP 1659 declares the entire topology.kubernetes.io prefix space as reserved for use by the Kubernetes project.

This KEP expands upon KEP1659 in the following ways:

  1. The x.topology.kubernetes.io prefix is allocated for use by end users. The kubernetes project itself will not define any standard labels with that prefix.
  2. The <domain>.x.topology.kubernetes.io prefix is likewise allocated for use by end users or third-parties. The <domain> portion is treated the same as a "normal" label prefix. For example, example.com.x.topology.k8s.io/label-name.
  3. All labels using the topology.kubernetes.io or *.topology.kubernetes.io prefix spaces are considered "safe" for workloads. A workload may be exposed to the values of these labels which directly apply to the workload. For example, a pod may learn the topology of the node on which it is running.
    """

Then this KEP can describe how it will expose those labels from nodes to pods (is it a literal copy or a virtual one, who wins in case of conflict, etc).

If we think there is a need for ANOTHER prefix which is "safe" but isn't "topology", we can discuss doing a similar carve-out for another prefix. But I'd argue it stops at 2.

@thockin thockin self-assigned this Jul 9, 2024
keps/sig-node/4742-node-labels-downward/README.md Outdated Show resolved Hide resolved

Workarounds today typically involve using an initContainer to query the
Kubernetes API and then pass data via shared volume to other containers within
the same pod. This adds additional demand on the API server and is burdensome
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to be really clear about my position on this:

  • DownwardAPI for anything inside the Pod object seems OK (we opened that door a long time ago).
  • DownwardAPI for things outside of the Pod object is not OK unless it follows authz rules (pod's SA).
  • The fact that workarounds exist means we should NOT consider this an extraoridnary need and should NOT be bending rules.
  • Arguments based on reducing apiserver load need to show proof.


The initial design includes:

* Being able to pass node labels to volumes with fieldPath of `node.metadata.labels` where only labels containing `topology.k8s.io/` are passed through
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we think "topology.k8s.io" in an extensibility mechanism, we need to spec that very clearly.

e.g. x.topology.k8s.io/<anything> is for extensions, while topology.k8s.io/<anything> is reserved for k8s use.

And even then we need to define "local". Can vendors define topology extensions?


The initial design includes:

* Being able to pass node labels to volumes with fieldPath of `node.metadata.labels` where only labels containing `topology.k8s.io/` are passed through
Copy link
Member

@thockin thockin Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fieldpaths are supposed to be paths to fields, not magic words.

What I proposed in kubernetes/kubernetes#40610 (comment) was to copy (literally copy the data) from node to pod, and define the behavior of that.

Then we don't need a new fieldpath at all. We just need to decide WHO copies that data, and whether it is literally (and pushed back to apiserver) or virtually copied by kubelet (and kept local, like service link env vars).

I think this will be net simpler than having new magic words.

attained through a configmap or secret mounted to a pod, being passed on
creation but not guaranteed to be immutable and thus should be treated as so.

* Can potentially be featuregated to make this feature strictly opt-in, if
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feature gates are not long-term flags -- if you need a mechanism for admisn to disable this entirely, that's not a feature gate (and I would argue it indicates a deeper problem)

## Motivation

We’d like to change the runtime behavior of containers based on node labels.
In our case, we’re using a CNI with DaemonSets to perform network setup, and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For very specific cases, we have workarounds like sidecars. If these DaemonSets need more info, that's their escape path.

@cmwylie19
Copy link

cmwylie19 commented Jul 9, 2024

Notes I got from the meeting so they can be fixed and address later when there is time:

  • Document behavior around restarts as it relates to fileMounts and env (values may change)
  • Elaborate/Fix use-cases (can't use vpn for example based on fixed values)

updated since we are not considering using kubelet flags

@thockin
Copy link
Member

thockin commented Jul 9, 2024

I am -2 on kubelet flags. In most managed providers, those flags are not accessible to users or admins, and differeing opinions between providers will make portability WORSE, not better.

We'd do better to define a small set of extension mechanisms and leave it fixed at that.

We’d like to change the runtime behavior of containers based on node labels.
In our case, we’re using a CNI with DaemonSets to perform network setup, and
would like to configure the network differently based on the presence of a node
label (e.g. `topology.k8s.io/vpn=true`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

topology.k8s.io/vpn is NOT a standard label. If you (abstract you) make up labels with "k8s.io" or "kubernetes.io" keys, you are asking to get broken. To the best of my knowledge, we have not declared that "topology.k8s.io" in an extensibility mechanism -- am I misremembering?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used this based off the suggestion made in kubernetes/kubernetes#40610 (comment) that these labels could be copied - but I think the comment there was addressing the very specific use case of things like being AZ-aware, and not the broader need to make certain node labels visible in pods.

Whatever the scheme, I think I'd like to stick with a convention for labels (e.g. downwardapi.k8s.io/* or shared.k8s.io/*) that get copied rather than specify which labels get copied via something like kubelet args or CRDs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we started this conversation, I assumed (I know, my mistake) that we would only be copying topology labels, and those topology labels are few in number and well defined by the project.

I am okay if we want to define a way for admins to extend that using our topology label space, as long as we make it clear what's an extension and what is standard. But we have to define that somewhere.

I am also okay, I guess, to define a different label space, which explicitly means that "these labels are safe to copy to pods". But again we have to define that somewhere.

I really really really do not want this to be something that is configurable, either through flags or crds or something else. Until and unless we can show that something simple and well defined absolutely cannot work, let's just do the simple thing.

@ArangoGutierrez
Copy link

Just as an after thought kubernetes-sigs/node-feature-discovery#1779

cmwylie19 and others added 4 commits July 25, 2024 14:57
Signed-off-by: Case Wylie <cmwylie19@defenseunicorns.com>
Signed-off-by: Case Wylie <cmwylie19@defenseunicorns.com>
@k8s-ci-robot
Copy link
Contributor

@docandrew: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-enhancements-verify d3ce8be link true /test pull-enhancements-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

-->

## Alternatives

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading @thockin comments about kubelet-sidecar. Maybe you can comment here about some alternatives that have been purusued or discussed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Status: Needs Triage
Status: Needs Reviewer
Development

Successfully merging this pull request may close these issues.

None yet

9 participants