Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Console path resolution is done in host mount namespace #814

Closed
cyphar opened this issue May 11, 2016 · 16 comments · Fixed by #1018
Closed

Console path resolution is done in host mount namespace #814

cyphar opened this issue May 11, 2016 · 16 comments · Fixed by #1018
Labels
Milestone

Comments

@cyphar
Copy link
Member

cyphar commented May 11, 2016

[I realised this while trying to get the test suite to run for #774.]

The main issue is that we set up the console in the parent's mount namespace. This breaks quite a few things. In addition, --console resolution done in the parent causes things like su to not work in containers (because glibc is broken). If you have a config.json like this:

{
    "ociVersion": "0.6.0-dev",
    "platform": {
        "os": "linux",
        "arch": "amd64"
    },
    "process": {
        "terminal": true,
        "user": {},
        "args": [
            "sh"
        ],
        "env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
            "TERM=xterm"
        ],
        "cwd": "/",
        "capabilities": [
            "CAP_AUDIT_WRITE",
            "CAP_KILL",
            "CAP_NET_BIND_SERVICE"
        ],
        "rlimits": [
            {
                "type": "RLIMIT_NOFILE",
                "hard": 1024,
                "soft": 1024
            }
        ],
        "noNewPrivileges": true
    },
    "root": {
        "path": "rootfs",
        "readonly": true
    },
    "hostname": "runc",
    "mounts": [
        {
            "destination": "/proc",
            "type": "proc",
            "source": "proc"
        },
        {
            "destination": "/dev",
            "type": "tmpfs",
            "source": "tmpfs",
            "options": [
                "nosuid",
                "strictatime",
                "mode=755",
                "size=65536k"
            ]
        },
        {
            "destination": "/dev/pts",
            "type": "devpts",
            "source": "devpts",
            "options": [
                "nosuid",
                "noexec",
                "newinstance",
                "ptmxmode=0666",
                "mode=0620"
            ]
        },
        {
            "destination": "/dev/shm",
            "type": "tmpfs",
            "source": "shm",
            "options": [
                "nosuid",
                "noexec",
                "nodev",
                "mode=1777",
                "size=65536k"
            ]
        },
        {
            "destination": "/dev/mqueue",
            "type": "mqueue",
            "source": "mqueue",
            "options": [
                "nosuid",
                "noexec",
                "nodev"
            ]
        },
        {
            "destination": "/sys",
            "type": "none",
            "source": "/sys",
            "options": [
                "rbind",
                "nosuid",
                "noexec",
                "nodev",
                "ro"
            ]
        }
    ],
    "hooks": {},
    "linux": {
        "uidMappings": [
            {
                "hostID": 1000,
                "containerID": 0,
                "size": 1
            }
        ],
        "gidMappings": [
            {
                "hostID": 100,
                "containerID": 0,
                "size": 1
            }
        ],
        "namespaces": [
            {
                "type": "pid"
            },
            {
                "type": "ipc"
            },
            {
                "type": "uts"
            },
            {
                "type": "mount"
            },
            {
                "type": "user"
            }
        ],
        "maskedPaths": [
            "/proc/kcore",
            "/proc/latency_stats",
            "/proc/timer_stats",
            "/proc/sched_debug"
        ],
        "readonlyPaths": [
            "/proc/asound",
            "/proc/bus",
            "/proc/fs",
            "/proc/irq",
            "/proc/sys",
            "/proc/sysrq-trigger"
        ]
    }
}

/cc @crosbymichael

@cyphar cyphar mentioned this issue May 11, 2016
46 tasks
@cyphar
Copy link
Member Author

cyphar commented May 11, 2016

This is probably a good reason for us to start adding integration tests for user namespaces.

@cyphar
Copy link
Member Author

cyphar commented May 11, 2016

This happens inside (*linuxConsole).dupStdio(), because /dev/pts/ptmx is being opened there. And since it's being opened in the host, we get screwed by the mode of /dev/pts/ptmx in the host's devpts instance -- we should fix this. According to the docs, we should probably be using /dev/pts/ptmx inside the container. However, this means that we'll have to mess around with things.

@cyphar cyphar added this to the 0.2.0 milestone May 13, 2016
@crosbymichael
Copy link
Member

The tricky part is getting the pty master back outside of the container after opening it.

@cyphar
Copy link
Member Author

cyphar commented May 29, 2016

This also links to several old issues with libcontainer with regard to sudo not working in containers -- it all links back to the fact that we are creating the terminal in the host. I'm trying to figure out a way to make this unnecessary.

@cyphar
Copy link
Member Author

cyphar commented Jun 1, 2016

Maybe we could use unix sockets to send the fd? I'll take another look at this next week.

@crosbymichael
Copy link
Member

i don't like unix socket sending fds. i really want to avoid these super containers with live connections going in and out.

@cyphar
Copy link
Member Author

cyphar commented Jun 2, 2016

I have two ideas currently:

  1. We use a unix socket, which requires setting up a unix socket just during setup. This is going to be messy because ancillary data requires a lot of C finesse. But it will work.
  2. We could try forking a process that shares the set of files (~CLONE_FILES) which then uses setns(2) to join the mount (and user if available) namespace of the container to then open either /proc/1/fd/0 or /dev/pts/<number>. The issue with this is that it might not work properly (I'm trying to remember what the minimum set of clone flags are for namespaces -- I'm fairly sure you need CLONE_FLAGS). We also need to add another nsenter-like reexec thing.

@wking
Copy link
Contributor

wking commented Jun 2, 2016

On Wed, Jun 01, 2016 at 07:37:47PM -0700, Aleksa Sarai wrote:

  1. We use a unix socket, which requires setting up a unix socket
    just during setup. This is going to be messy because ancillary data
    requires a lot of C finesse. But it will work.

It's not that difficult; ~50 lines for sendfd and recvfd wrappers in
ccon, based on an example in cmsg(3).

@cyphar
Copy link
Member Author

cyphar commented Jun 2, 2016

I said messy and finesse. It's not that it's hard, it's that it'll be a bit of C code we can't touch.

@wking
Copy link
Contributor

wking commented Jun 2, 2016

On Wed, Jun 01, 2016 at 07:47:35PM -0700, Aleksa Sarai wrote:

It's not that it's hard, it's that it'll be a bit of C code we can't
touch.

Why “can't touch”? I'm not even sure what that means :p.

@wking
Copy link
Contributor

wking commented Jun 2, 2016

On Wed, Jun 01, 2016 at 04:41:47PM -0700, Michael Crosby wrote:

i really want to avoid these super containers with live connections
going in and out.

Ccon uses an anonymous Unix socket (socketpair(2)) 1 the way runC
apparently uses a pipe 2. The host ↔ container socket gets closed
before the container process calls execve(2) (or similar) to launch
the user code. So there shouldn't be any more “live connections” than
you already have.

@cyphar
Copy link
Member Author

cyphar commented Jun 2, 2016

I'm going to be honest, I didn't know about socketpair(2). This would make the code much simpler. As for "can't touch" it just means that we'll have to mix some Go and C code that has odd semantics and we have to copy from the man page for cmsg and touching it would probably cause something to blow up.

@wking
Copy link
Contributor

wking commented Jun 2, 2016

On Wed, Jun 01, 2016 at 09:11:02PM -0700, Aleksa Sarai wrote:

… touching it would probably cause something to blow up.

That's true of most things, and what test suites are for ;).

@cyphar
Copy link
Member Author

cyphar commented Jun 2, 2016

If @crosbymichael is okay with using socketpair(2) in place of pipe(2) for the parent <-> child communication, I'd be fine with implementing the code this way. But there's a question of whether or not we'll get similar issues as with the whole ECONNRESET fiasco.

@cyphar
Copy link
Member Author

cyphar commented Jun 2, 2016

Wait hang on. We already use socketpair -- look at newPipe() at libcontainer/container_linux.go! So this should be a fairly simple extension.

@crosbymichael crosbymichael removed this from the 1.0.0 milestone Jun 3, 2016
@cyphar cyphar changed the title --console doesn't work with user namespaces Console path resolution is done in host mount namespace Jun 4, 2016
@cyphar cyphar added this to the 1.1.0 milestone Jun 6, 2016
@cyphar cyphar mentioned this issue Aug 15, 2016
2 tasks
@cyphar
Copy link
Member Author

cyphar commented Aug 17, 2016

Currently working on the design plan. https://gist.github.com/cyphar/8c6b9db84fc1f2cc2d037ef07942ca83

Here's @crosbymichael's mockup of how you could implement things in a simple C program. https://gist.github.com/crosbymichael/d3045070f0e2615814aaa31e8991d7fd

stefanberger pushed a commit to stefanberger/runc that referenced this issue Sep 8, 2017
stefanberger pushed a commit to stefanberger/runc that referenced this issue Sep 8, 2017
Through f4d221c (Merge pull request opencontainers#880 from
dqminh/wking-linux-only-capabilities-again, 2017-07-05).  The rc6
release picked up an earlier version of these notes, and those entries
are mostly unchanged except for:

* The credentialSpec entry, which was opencontainers#814 for credentialspec and now
  also includes opencontainers#859 for credentialSpec.

* The root(.path) Hyper-V entry, which was opencontainers#820 for root.path and now
  also includes opencontainers#838 for root.  I also moved this into the "breaking
  changes" section, because rc5 Hyper-V configs required root to be
  set, and rc6 Hyper-V configs require it to not be set.  Although
  whether rc5 allowed Hyper-V configs at all is not clear to me.

* Fixed indenting for the typo-fixes entry, as well as a number of
  more recent typo-fix PRs.

Signed-off-by: W. Trevor King <wking@tremily.us>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants