diff --git a/contributors/design-proposals/node/limit-node-object-self-control.md b/contributors/design-proposals/node/limit-node-object-self-control.md deleted file mode 100644 index 15248de752d..00000000000 --- a/contributors/design-proposals/node/limit-node-object-self-control.md +++ /dev/null @@ -1,164 +0,0 @@ -# Limiting Node Scope on the Node object - -### Author: Mike Danese, (@mikedanese) - -## Background - -Today the node client has total authority over its own Node object. This ability -is incredibly useful for the node auto-registration flow. Some examples of -fields the kubelet self-reports in the early node object are: - -1. Labels (provided by kubelet commandline) -1. Taints (provided by kubelet commandline) - -As well as others. - -## Problem - -While this distributed method of registration is convenient and expedient, it -has two problems that a centralized approach would not have. Minorly, it makes -management difficult. Instead of configuring labels and taints in a centralized -place, we must configure `N` kubelet command lines. More significantly, the -approach greatly compromises security. Below are two straightforward escalations -on an initially compromised node that exhibit the attack vector. - -### Capturing Dedicated Workloads - -Suppose company `foo` needs to run an application that deals with PII on -dedicated nodes to comply with government regulation. A common mechanism for -implementing dedicated nodes in Kubernetes today is to set a label or taint -(e.g. `foo/dedicated=customer-info-app`) on the node and to select these -dedicated nodes in the workload controller running `customer-info-app`. - -Since the nodes self reports labels upon registration, an intruder can easily -register a compromised node with label `foo/dedicated=customer-info-app`. The -scheduler will then bind `customer-info-app` to the compromised node potentially -giving the intruder easy access to the PII. - -This attack also extends to secrets. Suppose company `foo` runs their outward -facing nginx on dedicated nodes to reduce exposure to the company's publicly -trusted server certificates. They use the secret mechanism to distribute the -serving certificate key. An intruder captures the dedicated nginx workload in -the same way and can now use the node certificate to read the company's serving -certificate key. - -## Proposed Solution - -In many environments, we can improve the situation by centralizing reporting of -sensitive node attributes to a more trusted source and disallowing reporting of -these attributes from the kubelet. - -### Label And Taint Restriction - -An operator will configure a whitelist of taints and labels that nodes are -allowed to set on themselves. This list should include the taints and labels -that the kubelet is already setting on itself. - -Well known taint keys: -``` -node.cloudprovider.kubernetes.io/uninitialized -``` - -Well known label keys: - -``` -kubernetes.io/hostname -failure-domain.beta.kubernetes.io/zone -failure-domain.beta.kubernetes.io/region -beta.kubernetes.io/instance-type -beta.kubernetes.io/os -beta.kubernetes.io/arch -``` - -As well as any taints and labels that the operator is setting using: - -``` - --register-with-taints - --node-labels -``` - -This whitelist is passed as a command line flag to the apiserver. -NodeRestriction admission control will then prevent setting and modification by -nodes of all taints and labels with keys not in the whitelist. - -### NodeRestriction Config - -A new configuration API group will be created for the NodeRestriction admission -controller with the name `noderestriction.admission.k8s.io`. It will contain one -config object: - -```golang -type Configuration struct { - // AllowedLabels is a list of label keys a node is allowed to set on itself. - // The list also supports whitelisting all label keys with a specific prefix - // by adding an entry of the form `*`. - AllowedLabels []string - // AllowedTaints is a list of taint keys a node is allowed to set on itself. - // The list also supports whitelisting all taint keys with a specific prefix - // by adding an entry of the form `*`. - AllowedTaints []string -} -``` - -Labels and taints that are applied by the kubelet itself (and not by ---register-with configurations) do not need to appear in this config. They are -allowed implicitly. - -### NodeRestriction Config Examples - -A configuration that allows all labels and all taints with prefix `insecure.` -and the `foo` taint: - -```yaml -apiVersion: noderestriction.admission.k8s.io/v1 -kind: Configuration -allowedLabels: -- * -allowedTaints: -- foo -- insecure.* -``` - -A configuration that allows only labels for CSI plugins: - -```yaml -apiVersion: noderestriction.admission.k8s.io/v1 -kind: Configuration -allowedLabels: -- csi.kubernetes.io.* -``` - -For backwards compatibility, the default config is equivalent to: - -```yaml -apiVersion: noderestriction.admission.k8s.io/v1 -kind: Configuration -allowedLabels: -- * -allowedTaints: -- * -``` - -### Removing self-delete from Node Permission - -Currently a node has permission to delete itself. A node will only delete itself -when it's external name (inferred through the cloud provider) changes. This code -path will never be executated on the majority of cloud providers and this -capability undermines the usage of taints as a strong exclusion primitive. - -For example, suppose an operator sets a taint `compromised` on a node that they -believe has been compromised. Currently, the compromised node could delete and -recreate itself thereby removing the `compromised` taint. - -To prevent this, we will finish the removal of ExternalID which has been -deprecated since 1.1. This will allow us to remove the self delete permission -from the NodeAuthorizer. - -### Taints set by central controllers - -In many deployment environments, the sensitive attributes of a Node object -discussed above ("labels", "taints") are discoverable by consulting a machine -database (e.g. the GCE API). A centralized controller can register an -initializer for the node object and build the sensitive fields by consulting the -machine database. The `cloud-controller-manager` is an obvious candidate to -house such a controller. diff --git a/keps/sig-auth/0000-20170814-bounding-self-labeling-kubelets.md b/keps/sig-auth/0000-20170814-bounding-self-labeling-kubelets.md new file mode 100644 index 00000000000..73b3344ae0a --- /dev/null +++ b/keps/sig-auth/0000-20170814-bounding-self-labeling-kubelets.md @@ -0,0 +1,141 @@ +--- +kep-number: 0 +title: Bounding Self-Labeling Kubelets +authors: + - "@mikedanese" + - "@liggitt" +owning-sig: sig-auth +participating-sigs: + - sig-node + - sig-storage +reviewers: + - "@saad-ali" + - "@tallclair" +approvers: + - "@thockin" + - "@smarterclayton" +creation-date: 2017-08-14 +last-updated: 2018-10-31 +status: implementable +--- + +# Bounding Self-Labeling Kubelets + +## Motivation + +Today the node client has total authority over its own Node labels. +This ability is incredibly useful for the node auto-registration flow. +The kubelet reports a set of well-known labels, as well as additional +labels specified on the command line with `--node-labels`. + +While this distributed method of registration is convenient and expedient, it +has two problems that a centralized approach would not have. Minorly, it makes +management difficult. Instead of configuring labels in a centralized +place, we must configure `N` kubelet command lines. More significantly, the +approach greatly compromises security. Below are two straightforward escalations +on an initially compromised node that exhibit the attack vector. + +### Capturing Dedicated Workloads + +Suppose company `foo` needs to run an application that deals with PII on +dedicated nodes to comply with government regulation. A common mechanism for +implementing dedicated nodes in Kubernetes today is to set a label or taint +(e.g. `foo/dedicated=customer-info-app`) on the node and to select these +dedicated nodes in the workload controller running `customer-info-app`. + +Since the nodes self reports labels upon registration, an intruder can easily +register a compromised node with label `foo/dedicated=customer-info-app`. The +scheduler will then bind `customer-info-app` to the compromised node potentially +giving the intruder easy access to the PII. + +This attack also extends to secrets. Suppose company `foo` runs their outward +facing nginx on dedicated nodes to reduce exposure to the company's publicly +trusted server certificates. They use the secret mechanism to distribute the +serving certificate key. An intruder captures the dedicated nginx workload in +the same way and can now use the node certificate to read the company's serving +certificate key. + +## Proposal + +1. Modify the `NodeRestriction` admission plugin to prevent Kubelets from self-setting labels +within the `k8s.io` and `kubernetes.io` namespaces *except for these specifically allowed labels/prefixes*: + + ``` + kubernetes.io/hostname + kubernetes.io/instance-type + kubernetes.io/os + kubernetes.io/arch + + beta.kubernetes.io/instance-type + beta.kubernetes.io/os + beta.kubernetes.io/arch + + failure-domain.beta.kubernetes.io/zone + failure-domain.beta.kubernetes.io/region + + failure-domain.kubernetes.io/zone + failure-domain.kubernetes.io/region + + [*.]kubelet.kubernetes.io/* + [*.]node.kubernetes.io/* + ``` + +2. Reserve and document the `node-restriction.kubernetes.io/*` label prefix for cluster administrators +that want to label their `Node` objects centrally for isolation purposes. + + > The `node-restriction.kubernetes.io/*` label prefix is reserved for cluster administrators + > to isolate nodes. These labels cannot be self-set by kubelets when the `NodeRestriction` + > admission plugin is enabled. + +This accomplishes the following goals: + +- continues allowing people to use arbitrary labels under their own namespaces any way they wish +- supports legacy labels kubelets are already adding +- provides a place under the `kubernetes.io` label namespace for node isolation labeling +- provide a place under the `kubernetes.io` label namespace for kubelets to self-label with kubelet and node-specific labels + +## Implementation Timeline + +v1.13: + +* Kubelet deprecates setting `kubernetes.io` or `k8s.io` labels via `--node-labels`, +other than the specifically allowed labels/prefixes described above, +and warns when invoked with `kubernetes.io` or `k8s.io` labels outside that set. +* NodeRestriction admission prevents kubelets from adding/removing/modifying `[*.]node-restriction.kubernetes.io/*` labels on Node *create* and *update* +* NodeRestriction admission prevents kubelets from adding/removing/modifying `kubernetes.io` or `k8s.io` +labels other than the specifically allowed labels/prefixes described above on Node *update* only + +v1.15: + +* Kubelet removes the ability to set `kubernetes.io` or `k8s.io` labels via `--node-labels` +other than the specifically allowed labels/prefixes described above (deprecation period +of 6 months for CLI elements of admin-facing components is complete) + +v1.17: + +* NodeRestriction admission prevents kubelets from adding/removing/modifying `kubernetes.io` or `k8s.io` +labels other than the specifically allowed labels/prefixes described above on Node *update* and *create* +(oldest supported kubelet running against a v1.17 apiserver is v1.15) + +## Alternatives Considered + +### File or flag-based configuration of the apiserver to allow specifying allowed labels + +* A fixed set of labels and label prefixes is simpler to reason about, and makes every cluster behave consistently +* File-based config isn't easily inspectable to be able to verify enforced labels +* File-based config isn't easily kept in sync in HA apiserver setups + +### API-based configuration of the apiserver to allow specifying allowed labels + +* A fixed set of labels and label prefixes is simpler to reason about, and makes every cluster behave consistently +* An API object that controls the allowed labels is a potential escalation path for a compromised node + +### Allow kubelets to add any labels they wish, and add NoSchedule taints if disallowed labels are added + +* To be robust, this approach would also likely involve a controller to automatically inspect labels and remove the NoSchedule taint. This seemed overly complex. Additionally, it was difficult to come up with a tainting scheme that preserved information about which labels were the cause. + +### Forbid all labels regardless of namespace except for a specifically allowed set + +* This was much more disruptive to existing usage of `--node-labels`. +* This was much more difficult to integrate with other systems allowing arbitrary topology labels like CSI. +* This placed restrictions on how labels outside the `kubernetes.io` and `k8s.io` label namespaces could be used, which didn't seem proper.