Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestInitJoinNetworkAndUser failed on RHEL 7 #915

Open
haiyanmeng opened this issue Jun 14, 2016 · 19 comments
Open

TestInitJoinNetworkAndUser failed on RHEL 7 #915

haiyanmeng opened this issue Jun 14, 2016 · 19 comments

Comments

@haiyanmeng
Copy link
Contributor

I tested make test of runc after enabling user namespace on my RHEL 7 system, all the bats tests succeeded.
However, there is one failure of the unit tests, and it ends up with a dead docker container.

=== RUN   TestInitJoinNetworkAndUser
time="2016-06-14T14:31:04Z" level=warning msg="os: process already finished"
--- FAIL: TestInitJoinNetworkAndUser (0.31s)
    utils_test.go:51: exec_test.go:1597: unexpected error: process_linux.go:245: running exec setns process for init caused "exit status 1"                                                  

Here is the configuration of my RHEL 7 VM, which runs via virt-manager.

[root@localhost runc]# uname -a
Linux localhost.localdomain 3.10.0-327.18.2.el7.x86_64 #1 SMP Fri Apr 8 05:09:53 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost runc]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.2 (Maipo)
[root@localhost runc]# go version
go version go1.6.2 linux/amd64  
[root@localhost runc]# git branch
* master
[root@localhost runc]$ git log
commit 42dfd606437b538ffde4f0640d433916bee928e3
Merge: c046127 394610a
Author: Qiang Huang <h.huangqiang@huawei.com>
Date:   Tue Jun 14 14:19:20 2016 +0800

    Merge pull request #904 from euank/fix-cgroup-parsing-err
@haiyanmeng
Copy link
Contributor Author

@mrunalp , @rhatdan , PTAL.

@mrunalp
Copy link
Contributor

mrunalp commented Jun 14, 2016

Could you paste errors using strace -f ?

@haiyanmeng
Copy link
Contributor Author

Here is the log of strace -f -o test.log make test:
https://github.com/raw/hmeng-19/logs/master/test.log

@mrunalp
Copy link
Contributor

mrunalp commented Jun 14, 2016

Better would be to recreate the test using a rub config and just getting a strace on that.

Sent from my iPhone

On Jun 14, 2016, at 8:43 AM, hmeng-19 notifications@github.com wrote:

Here is the log of strace -f -o test.log make test:
https://github.com/raw/hmeng-19/logs/master/test.log


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@haiyanmeng
Copy link
Contributor Author

Okay. Let me give it a try.

@haiyanmeng
Copy link
Contributor Author

@mrunalp , I was not sure how to do the rug config you mentioned.

Here is what I did:
I modified the go test command for the localunittest target in the Makefile:

localunittest: all
        go test -timeout 3m -tags "$(BUILDTAGS)" ${TESTFLAGS} -v -run TestInitJoinNetworkAndUser ./libcontainer/integration

Run the localunittest directly by executing docker run directly:

strace -f -o log docker run -e TESTFLAGS -ti --privileged --rm -v /home/hmeng/go/src/github.com/hmeng-19/runc:/go/src/github.com/opencontainers/runc runc_test make localunittest

The log file still has 2338 lines:

[root@localhost runc]# wc -l log
2338 log

You can find the log file here: https://github.com/hmeng-19/logs/blob/master/log

@haiyanmeng
Copy link
Contributor Author

Here is the log of running the following command:

strace -s 4096 -f -o log docker run -e TESTFLAGS -ti --privileged --rm -v /home/hmeng/go/src/github.com/hmeng-19/runc:/go/src/github.com/opencontainers/runc runc_test make localunittest

https://github.com/hmeng-19/logs/blob/master/log1

@cyphar
Copy link
Member

cyphar commented Jun 14, 2016

@hmeng-19 You'll want to run strace inside the container docker run strace .... Tracing the Docker client won't work since runC isn't a child of the client (it's a child of the daemon).

@haiyanmeng
Copy link
Contributor Author

@cyphar , thanks for the help.

I modified the go test command for the localunittest target in the Makefile:

localunittest: all
        apt-get install strace
        strace -f -o log3 go test -timeout 3m -tags "$(BUILDTAGS)" ${TESTFLAGS} -v -run TestInitJoinNetworkAndUser ./libcontainer/integration

Then I ran make test:

[root@localhost runc]# make test

The log file, log3 is here: https://github.com/raw/hmeng-19/logs/master/log3

@haiyanmeng
Copy link
Contributor Author

@mrunalp , @rhatdan , it seems that the clone syscall in libcontainer/nsenter/nsexec.c failed. Here is the failure from the strace log:

98    clone(child_stack=0x7ffc83481340, flags=CLONE_PARENT|CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWPID|SIGCHLD) = -1 EPERM (Operation not permitted)
84    <... select resumed> )            = 0 (Timeout)
98    write(2, "nsenter: Unable to fork: Operati"..., 49 <unfinished ...>

Here is the code snippet where the failure happened:

static int clone_parent(jmp_buf *env, int flags)
{
    struct clone_arg ca;
    int      child;

    ca.env = env;
    child  = clone(child_func, ca.stack_ptr, CLONE_PARENT | SIGCHLD | flags,
              &ca);

Considering the error of clone is EPERM, I tried sudo make test, however, the same error still happened.

@wking
Copy link
Contributor

wking commented Jun 27, 2016

On Mon, Jun 27, 2016 at 06:50:19AM -0700, hmeng-19 wrote:

98    clone(child_stack=0x7ffc83481340, flags=CLONE_PARENT|CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWPID|SIGCHLD) = -1 EPERM (Operation not permitted)


Considering the error of clone is EPERM, I tried sudo make test, however, the same error still happened.

clone(2) lists a number of other situations in which the kernel
returns EPERM besides caller privilege 1. None of them jump out at
me as “likely the case in a RHEL 7 VM” though…

@rhatdan
Copy link
Contributor

rhatdan commented Jun 28, 2016

Could be caused by SELinux or lack of capabilities although running with sudo might have fixed this. seccomp also could block this although I don't think that would be the case here.

Try test with setenforce 0.

@haiyanmeng
Copy link
Contributor Author

@rhatdan , I tried to run the test as root with selinux running in permissive mode, the same error still happened:

[root@rhel7 runc]# setenforce 0

[root@rhel7 runc]# sestatus 
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   permissive
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      28

[root@rhel7 runc]# make 

[root@rhel7 runc]# make test
=== RUN   TestInitJoinNetworkAndUser
time="2016-06-28T13:35:22Z" level=warning msg="os: process already finished"
--- FAIL: TestInitJoinNetworkAndUser (0.35s)
    utils_test.go:51: exec_test.go:1597: unexpected error: process_linux.go:245: running exec setns process for init caused "exit status 1"

@mrunalp
Copy link
Contributor

mrunalp commented Jun 28, 2016

Have you enabled user namespaces in the kernel?

Sent from my iPhone

On Jun 28, 2016, at 6:40 AM, hmeng-19 notifications@github.com wrote:

@rhatdan , I tried to run the test as root with selinux running in permissive mode, the same error still happened:

[root@rhel7 runc]# setenforce 0

[root@rhel7 runc]# sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: permissive
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 28

[root@rhel7 runc]# make

[root@rhel7 runc]# make test
=== RUN TestInitJoinNetworkAndUser
time="2016-06-28T13:35:22Z" level=warning msg="os: process already finished"
--- FAIL: TestInitJoinNetworkAndUser (0.35s)
utils_test.go:51: exec_test.go:1597: unexpected error: process_linux.go:245: running exec setns process for init caused "exit status 1"

You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@haiyanmeng
Copy link
Contributor Author

@mrunalp , I ran the grubby command you shared with me, and the grub.cfg file shows that user_namespace is enabled on my machine:

menuentry 'Red Hat Enterprise Linux Server (3.10.0-327.18.2.el7.x86_64) 7.2 (Maipo)' --class red --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0
-327.el7.x86_64-advanced-cadd8c32-75d6-4ff1-8af2-26ad68351b5d' {
        load_video
        set gfxpayload=keep
        insmod gzio
        insmod part_msdos
        insmod xfs
        set root='hd0,msdos1'
        if [ x$feature_platform_search_hint = xy ]; then
          search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1 --hint='hd0,msdos1'  07838131-21d0-4856-b470-fff5e2481eab
        else
          search --no-floppy --fs-uuid --set=root 07838131-21d0-4856-b470-fff5e2481eab
        fi
        linux16 /vmlinuz-3.10.0-327.18.2.el7.x86_64 root=/dev/mapper/rhel-root ro rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet LANG=en_US.UTF-8 user_namespace.enable=1
        initrd16 /initramfs-3.10.0-327.18.2.el7.x86_64.img
}

@haiyanmeng haiyanmeng mentioned this issue Aug 1, 2016
2 tasks
@haiyanmeng
Copy link
Contributor Author

I recently did more exploration on this problem.

It seems that, on RHEL7, once a process joins another process's user namespace, it can not
run clone with the CLONE_NEWNS flag.

To illustrate this, I constructed two test C programs - demo_userns.c and join.c.

The above two test programs uses clone. Here are the test programs using unshare: demo_userns.c and join.c.

demo_userns.c clones a process with all the six namespaces - pid, mount, user, ipc, uts and network.
You can compile demo_userns.c with the following command:

gcc -o demo_userns demo_userns.c -lcap

join.c joins the user namespace of the process cloned by demo_userns, and then clones a child
process with additional namespaces.
You can compile join.c with the following command:

gcc -o join join.c -lcap

To test it, in one terminal run this:

# this will print the pid of the child process, which can be used by the `join` binary. 
./demo_userns

Then open another terminal:

./join [options] <the child pid returned from ./demo_userns>

**** the usage of join ****

Usage: ./join [options] <pid>

Create a child process that joins the user namespace of another process <pid>, and executes `sh`in a new mount|uts|pid|ipc|network namespace,
and possibly the combinations of these five namespaces.

Options can be:

    -i          New IPC namespace
    -m          New mount namespace
    -n          New network namespace
    -p          New PID namespace
    -u          New UTS namespace

@cyphar
Copy link
Member

cyphar commented Sep 11, 2016

@mrunalp Will this be fixed by #975?

@mrunalp
Copy link
Contributor

mrunalp commented Sep 11, 2016

@cyphar I will test it and let you know.

stefanberger pushed a commit to stefanberger/runc that referenced this issue Sep 8, 2017
config: Add a trailing period to the "cannot be mapped" rlimits line
@vikaschoudhary16
Copy link
Contributor

@mrunalp @cyphar @hmeng-19
Any updates on this? Today i faced this same this issue on RHEL7 while running make test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants