Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP 2763: Ambient capabilities in Kubernetes #2757

Merged
merged 8 commits into from
Oct 7, 2021

Conversation

vinayakankugoyal
Copy link
Contributor

@vinayakankugoyal vinayakankugoyal commented May 19, 2021

Hello! thank you for taking the time to review this KEP.

related issue kubernetes/kubernetes#56374

tracking issue in containerd: containerd/containerd#5644

KEP issue: #2763

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 19, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @vinayakankugoyal. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 19, 2021
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/security Categorizes an issue or PR as relevant to SIG Security. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 19, 2021
@vinayakankugoyal
Copy link
Contributor Author

/cc @tallclair

@vinayakankugoyal vinayakankugoyal force-pushed the implentable branch 3 times, most recently from 62ea37c to 10fa18e Compare May 20, 2021 15:27
Copy link
Member

@tallclair tallclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up! I'd really like to see this feature.

[documentation style guide]: https://github.com/kubernetes/community/blob/master/contributors/guide/style-guide.md
-->

This KEP proposes that kubernetes provide a way to set ambient capabilities for containers through the Pod manifest. It also proposes changes that must be made to containerd to enable ambient capabilities end-to-end.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there's a separate process for proposing features for containerd. It probably makes senes to come to an understanding of what we want in k8s first, but I'd like to see the containerd (and ideally cri-o) changes merge before this KEP is marked implementable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there runtimes other than containerd we should care about? How do we usually expand CRI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this KEP focusses on containerd. I think we should expand this to other runtimes in the future. Coordinating this across multiple runtimes at once would be extremely challenging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tallclair

but I'd like to see the containerd (and ideally cri-o) changes merge before this KEP is marked implementable

but wouldn't we have to first update the CRI API for them to start reading the AddAmbientCapabilities field?
I think that the order of operation should be:

  1. Update CRI API
  2. Update containerd and CRI-O
  3. Update k8s core APIs

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense. Can we just make sure a containerd & cri-o maintainer approves this KEP? I recommend @mrunalp for cri-O, I'm not sure who's active on containerd these days.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup we have volunteers for reviewing/approving the KEP from
CRI API - @SergeyKanzhelev
containerd - @mikebrow
cri-O - @mrunalp

keps/sig-security/NNNN-ambient-capabilities/README.md Outdated Show resolved Hide resolved
Capabilities *Capabilities `json:"capabilities,omitempty" protobuf:"bytes,1,opt,name=capabilities"`
// The ambient capabilities to add when running containers as non-root.
// +optional
AmbientCapabilities []Capability `json:"ambientCapabilities,omitempty" protobuf:"bytes,1,rep,name=ambientCapabilities,casttype=[]Capability"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with a separate field, I suggest putting it in the Capabilities struct. E.g. Ambient works like add, but also makes it ambient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! I agree we should do this as well.


### Changes to kubernetes API (https://pkg.go.dev/k8s.io/api/core/v1)

Reusing the existing capabilities field in the securityContext might cause confusion and we propose adding a new field in the SecurityContext called ambientCapabilities.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be open to reusing the add field, as long as the default set is still not included in the ambient set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree. We inherited the API shape from docker here, and it's not ideal. If we are working in the confines of today's shape, it seems plausible that add: CAP_* on a non-root pod actually means the spec author wanted the process to have CAP_*.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC what you are suggesting is that add will add the explicitly added capability to the ambient set and the default ones would still be added like before? What I envisioned was that if you are adding a capability using addAmbient then under the hood we drop all the default capabilities and only add the explicitly added capability to the ambient set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. IIRC, the way Add works today is it adds it to the bounding & effective sets, the same for the default capabilities. What I'm suggesting is that the default capabilities are still only added to bounding & effective (as is proposed), and Add adds to those sets, but also to Ambient. Does that make sense?

Copy link
Contributor Author

@vinayakankugoyal vinayakankugoyal Jun 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default and explicitly added capabilities are added to inherited, permitted, bounding and effective sets. See this for the code location where this is done.

What in think you are suggesting:

  • when someone adds a capability explicitly - that capability should be added to all sets i.e. inherited, permitted, bounding, effective and ambient.
  • the default capabilities should only be added to inherited. permitted,bounding and effective sets.
  • The advantage of your approach is that:
    • we would not require change to kubernetes API, we would still require changes to the CRI API so that we can distinguish which ones are added to the ambient set vs which ones are not.
  • The disadvantage of your approach is:
    • we are changing the behavior of an existing field. (not saying this is bad, I think this shouldn't break anyone)

Copy link
Contributor Author

@vinayakankugoyal vinayakankugoyal Jun 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think kubernetes/kubernetes#56374 (comment) talks about why we would want to add a new field.
Adding a new field is a simple add and would make it easier to implement the feature. (We wouldn't have to worry about breaking something as the old behavior would remain untouched)

keps/sig-security/NNNN-ambient-capabilities/README.md Outdated Show resolved Hide resolved
```


Notes about how ambientCapabilities would work:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should forbid adding certain capabilities if the container sets runAsNonRoot? For example, runAsNonRoot is probably not adding much protection if you add CAP_SYS_ADMIN (or even CAP_DAC_OVERRIDE, in the default set) to the ambient set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was thinking the same! Great call out!

DropCapabilities []string `protobuf:"bytes,2,rep,name=drop_capabilities,json=dropCapabilities,proto3" json:"drop_capabilities,omitempty"`
XXX_NoUnkeyedLiteral struct{} `json:"-"`
XXX_sizecache int32 `json:"-"`
// List of ambient capabilities to add.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should AddAmbientCapabilities also add them to the inherited & bound sets? I think so?

Copy link
Contributor Author

@vinayakankugoyal vinayakankugoyal Jun 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would need them to be added to inheritable and permitted set. See https://lwn.net/Articles/636533/.
TLDR;
AmbientCapabilities obey the invariant that no bit can ever be set in ambient set if it is
not set in both permitted and inheritable.

XXX_NoUnkeyedLiteral struct{} `json:"-"`
XXX_sizecache int32 `json:"-"`
// List of ambient capabilities to add.
AddAmbientCapabilities []string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Even if we end up reusing the existing Add capabilities field in the k8s API, I agree that it should be kept separate at the CRI level.

@tallclair
Copy link
Member

/cc @mrunalp

@k8s-ci-robot k8s-ci-robot requested a review from mrunalp May 20, 2021 16:42
@vinayakankugoyal vinayakankugoyal changed the title Initial KEP for ambient capabilities in kubernetes. KEP for ambient capabilities in kubernetes. May 21, 2021
Drop []Capability `json:"drop,omitempty" protobuf:"bytes,2,rep,name=drop,casttype=Capability"`
// Ambient capabilities to add
// +optional
Ambient []Capability `json:"ambient,omitempty" protobuf:"bytes,2,rep,name=ambient,casttype=Capability"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative could be a boolean for ambient that uses the default capabilties list.

Copy link
Contributor Author

@vinayakankugoyal vinayakankugoyal Jun 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool does not work because containerd/docker add some capabilities by default and they will all become ambient for a non-root container, essentially making it root.

@vinayakankugoyal vinayakankugoyal force-pushed the implentable branch 2 times, most recently from 5ada203 to 57720d4 Compare May 22, 2021 01:11

### Changes to kubernetes API (https://pkg.go.dev/k8s.io/api/core/v1)

<<[UNRESOLVED should we reuse add instead of adding new field]>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on this note kubernetes/kubernetes#56374 (comment)
I don't think we should reuse add. I think having an additional slice in the Capabilities struct that is add-only is a clean addition

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is my two cents about the k8s API design. (Talked with Vinayak offline already)

In one end of the spectrum, k8s API can expose the exact interfaces of the layer below it. In this case, it is Linux capabilities API. It means k8s API can just allow user directly interact with permitted, inherited, effective, bounding and ambient sets. This way gives user 100% control and flexibility as what Linux allows. This also means k8s API need to adapt with Linux capabilities API. For example, a new thing in Linux capabilities results in a new thing in k8s API.

In the other end, k8s API defines and abstracts its own way of interacting with capabilities. So k8s API can be simpler, at the cost of flexibility. And it should hide details of the underlying OS. The original design of securityContext.capabilities.add and drop are pretty concise. But it is unclear if it supports both root and non-root. It seems to support but from this k8s doc, it doesn't support non root.

Linux Capabilities: Give a process some privileges, but not all the privileges of the root user.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this moment, we need to be backward-compatible, which is the only reason that I agree we need a new field. With the new field, I propose to clarify the spec, based on [kubernetes/kubernetes#56374 (comment)].

  • add: if root user, add capabilities; if non-root, behavior is unspecified
  • drop: if root user, drop capabilities; if non-root, behavior is unspecified
  • setForNonRoot (open to suggestion, I name it based on runAsNonRoot): if non-root user, set capabilities; if root, no ops

My goals are:

  1. clarify the root vs non-root cases
  2. clarify how the new field work with existing ones
  3. hide linux details

Copy link

@qiutongs qiutongs Jun 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some New thoughts after talked with Vinayak today. I prefer (1) > (3) > (2)

Option 1: Reuse add and drop

  • The capabilities will be added to ambient set, besides the current capability sets. Thus it will work for non-root user case without file capability.
  • It will not break existing scenarios.

Option 2: Add new field - ambient

  • Users need to understand all those Linux capability concept: https://lwn.net/Articles/636533/
  • We need to enhance the add and drop spec by saying permitted, inheritable, bounding and effective set.

Option 3: Add new field - addForNonRoot

  • add is for any process(parent or child) with root user
  • drop is for any process(parent or child) with root user
  • addForNonRoot is for any process(parent or child) with non-root user

@dchen1107
Copy link
Member

cc/ @SergeyKanzhelev @qiutongs

@vinayakankugoyal
Copy link
Contributor Author

/cc @SergeyKanzhelev @qiutongs

@tabbysable
Copy link
Member

/approve

@tabbysable
Copy link
Member

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Oct 2, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tabbysable, vinayakankugoyal

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 4, 2021
@tabbysable
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 6, 2021
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 7, 2021
@vinayakankugoyal
Copy link
Contributor Author

@tabbysable could you please LGTM again (sorry for the inconvenience). There was a typo in the kep metadata that was causing the verify-kep-metadata.sh test to fail. Those are the only changes since last LGTM. Thank you!

@tabbysable
Copy link
Member

we should probably bump the milestones to 1.24, since this is certainly not going into 1.23, just to avoid confusion

@tabbysable
Copy link
Member

alright everyone, buckle up:
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 7, 2021
@vinayakankugoyal
Copy link
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot merged commit 3133297 into kubernetes:master Oct 7, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Oct 7, 2021
hh pushed a commit to ii/keps that referenced this pull request Dec 7, 2021
* KEP 2763: Ambient capabilities in Kubernetes

* KEP 2763: Add use cases and polish

Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>

* update prod-readiness yaml. Change beta to alpha.

* Minor formatting fixes.

* Delete 2763.yaml because merging provisionally.

* Update kep.yaml

* Update kep.yaml

Co-authored-by: Alexey Perevalov <alexey.perevalov@huawei.com>
rikatz pushed a commit to rikatz/enhancements that referenced this pull request Feb 1, 2022
* KEP 2763: Ambient capabilities in Kubernetes

* KEP 2763: Add use cases and polish

Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>

* update prod-readiness yaml. Change beta to alpha.

* Minor formatting fixes.

* Delete 2763.yaml because merging provisionally.

* Update kep.yaml

* Update kep.yaml

Co-authored-by: Alexey Perevalov <alexey.perevalov@huawei.com>
There are 2 options here:-
- **Option 1:** Reuse Add field in [Capabilities](https://pkg.go.dev/k8s.io/api/core/v1#Capabilities)

When a capability gets added explicitly to a non-root container it also gets added to the ambient set in addition to getting added to inheritable, permitted, bounding and effective sets. The default capabilities are not added to the ambient set.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having capabilities being added to the inheritable set turned out to be a CVE affecting multiple runtimes (Moby, containerd, CRI-O, Podman...), more details in the Moby advisory: GHSA-2mm7-x5h6-5pvq

Having the capabilities added directly to the ambient set requires them to be present in the inheritable set as well, thus getting back the old behavior.

I think this option should not be considered.


#### Restricted ambient capabilities.

<<[UNRESOLVED what is the set of capabilities that we should allow to be made ambient]>>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously when this was discussed PSP's were deprecated and there wasn't alternative. But we now have a PodSecurity admission controller in beta and enabled by default as of 1.23.

So I think if there are any restrictions it should be done in the PodSecurity admission controller. Most should probably be under the 'Privileged' category.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jc2k are you thinking of a change to the pod security standards? These are what you can enforce with Pod Security admission.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends.

For context, let me stress again that when this section of the KEP were first discussed, the admission controller wasn't in beta (if it existed at all) so it wasn't considered a valid solution to this UNRESOLVED part of the KEP.

In the case where we re-use securityContext.capabilities.add, the existing standards look like they are valid for root and non-root containers. Neither would be allowed to use CAP_SYS_ADMIN or CAP_DAC_OVERRIDE etc unless they were using the privileged policy level. This is good.

If we are not reusing securityContext.capabilities.add, then I think the standards should be updated anyway. They currently cover all the ways you can add capabilities to containers. If there is a new way to add a capabitility to a container, it should be covered by the standards.

@Jc2k
Copy link

Jc2k commented Apr 26, 2022

I'm aware of at least one popular CNI that now has a non-root mode. It does this through suid binaries. I'd much prefer a DaemonSet with an explicit and limited capability allow list to a container image using suid binaries, even if both ultimately have powerful permission sets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/security Categorizes an issue or PR as relevant to SIG Security. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.