Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation for sysctl #303

Merged
merged 1 commit into from
Feb 25, 2016

Conversation

mrunalp
Copy link
Contributor

@mrunalp mrunalp commented Sep 25, 2015

/proc/sys isn't completely namespaced and only some properties are allowed
per linux namespace.

The list has been compiled from namespaces(7)

Signed-off-by: Mrunal Patel mrunalp@gmail.com

@mrunalp
Copy link
Contributor Author

mrunalp commented Sep 25, 2015

@crosbymichael @LK4D4 ping
cc: @rhatdan

/proc/sys isn't completely namespaced and only some properties are allowed
per linux namespace.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
@crosbymichael
Copy link
Member

is it even worth having this field when only a few are valid?

@mrunalp
Copy link
Contributor Author

mrunalp commented Sep 25, 2015

@crosbymichael We have a lot of customers asking for this. net itself gives a lot of options.

@crosbymichael
Copy link
Member

oh, right, i see the prefixes now

@mrunalp
Copy link
Contributor Author

mrunalp commented Sep 25, 2015

Also, more options will be made configurable over time from what I gathered talking to kernel engineers.

@rhatdan
Copy link
Contributor

rhatdan commented Sep 25, 2015

@jeremyeder PTAL

Are all of these network sysctls actually Namespace aware?

@rhatdan
Copy link
Contributor

rhatdan commented Sep 25, 2015

Would we want to make this available to docker as a public interface so that it could catch these early in the parse commands?

@mrunalp
Copy link
Contributor Author

mrunalp commented Sep 25, 2015

@rhatdan Yes, we could do that and docker could validate early. Also, @jeder's input will be useful to make sure this list is sane. I am going to look at kernel source code to see if I missed any settings. There is no other way unfortunately.

@rhatdan
Copy link
Contributor

rhatdan commented Sep 25, 2015

I think jeder should be @jeremyeder.

@jeremyeder
Copy link

@Mrunal No, we can't just pass in the whole "net" sysctl space. Even though some might actually work, there's some strange side-effects; for example you can adjust net.ipv4.tcp_mem in a container but the stack actually respects what's set on the host's kernel.

@rhatdan
Copy link
Contributor

rhatdan commented Sep 28, 2015

vm.nr_hugepages

Does not blow up, however not sure it worked correctly.

docker run --privileged -ti --sysctl vm.nr_hugepages=2000 fedora sh
cat /proc/sys/vm/nr_hugepages
285
docker run --privileged -ti --sysctl vm.nr_hugepages=5000 fedora sh
# cat /proc/sys/vm/nr_hugepages
310

@mrunalp
Copy link
Contributor Author

mrunalp commented Sep 28, 2015

@jeremyeder Thanks for your input. I will get input from the right folks to create a better list for net and things that may be missing here.

@jeder
Copy link

jeder commented Sep 28, 2015

I'm not @jeremyeder, so if he is not responding, you may have an address
wrong.

-John Eder

On Tue, Sep 29, 2015 at 4:43 AM, Mrunal Patel notifications@github.com
wrote:

@jeremyeder https://github.com/jeremyeder Thanks for your input. I will
get input from the right folks to create a better list for net and things
that may be missing here.


Reply to this email directly or view it on GitHub
#303 (comment).

J. Eder

+1 619.847.5579 | +81 80.8903.2817 | j@eder.ws

@crosbymichael
Copy link
Member

Should we validate at all? If you asked for something we just try to do it?

@rhatdan
Copy link
Contributor

rhatdan commented Oct 13, 2015

I don't think we should validate. We should leave his up to the user to validate whether it works or not, as we explain in the man pages and docs. Perhaps it would be helpful to have a page where we list sysctls that we know work with a particular kernel version.

Going forward this check is going to be very kernel specific, so verification is often going to be wrong.

@mrunalp
Copy link
Contributor Author

mrunalp commented Oct 13, 2015

@crosbymichael @rhatdan Sounds good. If we find a better way we could add it in the future.

@mrunalp mrunalp closed this Oct 13, 2015
@mrunalp mrunalp reopened this Jan 7, 2016
@mrunalp
Copy link
Contributor Author

mrunalp commented Jan 7, 2016

Re-opening to see if there is interest in merging this so sysctl could be enabled safely in docker

@mrunalp
Copy link
Contributor Author

mrunalp commented Jan 8, 2016

@crosbymichael @LK4D4 WDYT? There are quite a few use cases that are blocked by not enabling this feature in docker.

@vishh
Copy link
Contributor

vishh commented Jan 14, 2016

From a community sense, it makes sense to curate a list of safe procfs tunables as part of runc. Until we have a complete list, we could possibly allow a backdoor.

@crosbymichael
Copy link
Member

Is there a way were we can move this out of the specs and into hooks/ create/ start lifecycle?

@mrunalp
Copy link
Contributor Author

mrunalp commented Jan 14, 2016

It would be possible but a very roundabout way. Any concerns with doing it like in this PR? We could go by the man page and support what is documented.

@rhatdan
Copy link
Contributor

rhatdan commented Jan 15, 2016

@crosbymichael Why do you think it would be better in one of those?

@crosbymichael
Copy link
Member

@mrunalp do you just want to do the validation at the higher level so that we don't have to recompile or update libcontainer/runc when something changes?

@crosbymichael crosbymichael added this to the 0.0.9 milestone Feb 18, 2016
@rhatdan
Copy link
Contributor

rhatdan commented Feb 18, 2016

Not sure you want to do this, if we want to prevent users from shooting themselves in the foot, when doing runc/oci type apps. If we want to protect them, we should protect them in all tools.

One thing we are doing in runc is also looking at whether they disabled the host or ipc namespace, which is pretty difficult to do in the docker cli.

@crosbymichael
Copy link
Member

LGTM

@mrunalp
Copy link
Contributor Author

mrunalp commented Feb 19, 2016

If we do run into an issue where we have to frequently modify this list then we could add configuration for it. I think we can merge this and then see how it goes.

@hqhq
Copy link
Contributor

hqhq commented Feb 19, 2016

So after this, container will never be as privileged as host, isn't this gonna break some users who use container to hack with host, like rancherOS? ping @ibuildthecloud

The validation looks reasonable to me, just think maybe we should make it optional.

@mrunalp
Copy link
Contributor Author

mrunalp commented Feb 19, 2016

We could make it optional in privileged mode as the file system is made read only in non privileged anyways.

Sent from my iPhone

On Feb 19, 2016, at 9:36 AM, Qiang Huang notifications@github.com wrote:

So after this, container will never be as privileged as host, isn't this gonna break some users who use container to hack with host, like rancherOS? ping @ibuildthecloud

The validation looks reasonable to me, just think maybe we should make it optional.


Reply to this email directly or view it on GitHub.

@rhatdan
Copy link
Contributor

rhatdan commented Feb 19, 2016

Of course in --privileged mode you can just execute sysctl directly.

@mrunalp
Copy link
Contributor Author

mrunalp commented Feb 22, 2016

I think we don't need to modify this PR. One can go and set /proc/sys when in privileged mode and there is no harm in the validation when using the CLI so one distinguish between what is safe in container vs being set on the host.

@rhatdan
Copy link
Contributor

rhatdan commented Feb 22, 2016

I agree. We are only allowing what we know of as "Namespaced sysctls"

@hqhq
Copy link
Contributor

hqhq commented Feb 23, 2016

@mrunalp Do you mean we can make the sysctl validation optional now without any furture modification? Can you explain how to do that?

@mrunalp
Copy link
Contributor Author

mrunalp commented Feb 23, 2016

I mean that this patch is sufficient. No special handling of privileged is necessary as /proc/sys is writeable under privileged and allows the user to set sysctls that are not container specific.

Sent from my iPhone

On Feb 23, 2016, at 1:46 PM, Qiang Huang notifications@github.com wrote:

@mrunalp Do you mean we can make the sysctl validation optional now without any furture modification? Can you explain how to do that?


Reply to this email directly or view it on GitHub.

@hqhq
Copy link
Contributor

hqhq commented Feb 23, 2016

@mrunalp Thanks for clarification, I'm worry about people might want use sysctl with CLI in --privileged mode which is forbidden by the validation (these non-namespaced sysctls). Anyway we can do that in follow PRs if it do bother some users.

LGTM.

@rhatdan
Copy link
Contributor

rhatdan commented Feb 23, 2016

@hqhq I think the current assumption is that --privileged does not change other behaviour in the system. We were discussing similar behaviour with UserNamespace and it was decided that --privileged does not effect userNamespace but a --userns=host is being added to turn off user namespace. I think that precedent would follow here. In that you can do priv ops in the container, but we will continue to block you doing something accidentally.

crosbymichael added a commit that referenced this pull request Feb 25, 2016
@crosbymichael crosbymichael merged commit fc8c8ed into opencontainers:master Feb 25, 2016
stefanberger pushed a commit to stefanberger/runc that referenced this pull request Sep 8, 2017
narrative cleanup in support of Base Config Compatibility opencontainers#303
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants