Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating a sysbox container with systemd >= 247.2 may fail #273

Closed
ctalledo opened this issue Apr 23, 2021 · 9 comments
Closed

Creating a sysbox container with systemd >= 247.2 may fail #273

ctalledo opened this issue Apr 23, 2021 · 9 comments
Assignees
Labels
bug Something isn't working

Comments

@ctalledo
Copy link
Member

This problem was reported by @AlexTalker in a comment in issue #269. I am creating this dedicated issue to track it as it's a different problem than #269.

Basically, creating a sysbox container with the latest manjarolinux/base image fails:

$ docker run --runtime=sysbox-runc -it --rm --entrypoint="/sbin/init" manjarolinux/base
systemd 247.3-1-manjaro running in system mode. (+PAM +AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
Detected virtualization container-other.
Detected architecture x86-64.
Detected first boot.

Welcome to Manjaro Linux!

Set hostname to <459c3293a364>.
Initializing machine ID from random generator.
Failed to create /init.scope control group: Operation not permitted
Failed to allocate manager object: Operation not permitted
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...

I dug a bit, and I can see the failure is due to the systemd instance inside the container getting an EPERM when executing a syscall:

syscall_0x1b7(0xffffff9c, 0x7fff83594860, 0, 0x100, 0x1, 0x7ff622148a60) = -1 EPERM (Operation not permitted)

This syscall is in fact the recently added faccessat2() system call in Linux.

This syscall, like it's older sibling faccessat() checks a user's permissions on a given file but supports an extra flags parameter. Apparently recent versions of gcc are generating binaries with this syscall.

It's unclear however why the syscall returns EPERM and whether this has anything to do with the fact that Sysbox uses rootless containers via the Linux user-namespace. Unfortunately the strace version I have did not decode the syscall parameters to give us a better clue.

The problem is not specific to Sysbox. I see the following recent mentions of it too:

https://serverfault.com/questions/1052963/pacman-doesnt-work-in-docker-image
https://bugzilla.redhat.com/show_bug.cgi?id=1869030

Per Alex's report in #269, this issue impacts containers with systemd versions > 247.2:

"So I pulled it, ran and it works too. I scratched my head a bit and found out that difference is down to minors - this image too has 247 Systemd but it's 247.2 and Manjaro is 247.6. So I ran pacman -Syu and it gave me 248, after which container does no longer starts"

We need to investigate further to see why the faccess2() syscall fails and how we can overcome this.

@ctalledo ctalledo changed the title Creating a sysbox container with manjarolinux/base fails Creating a sysbox container with systemd >= 247.2 may fail Apr 23, 2021
@AlexTalker
Copy link

AlexTalker commented Apr 23, 2021

@ctalledo Well, in any case before I reached out for GLibC part, I updated it all, so it definitely might be the case because systemd(init) file is linked to the same libc and common problem between the commands after updating it is that they can't see files that are there.

Really confusing is why on casual runc this isn't an issue. Is there some architectural flaw am I missing in your solution(a.k.a. product)?

@AlexTalker
Copy link

Regarding faccessat2, I can find man of it on my beloved makier service dating by March of 2021 but I cannot find it on my Focal, is that the problem here? Might Kernel or GLibC be unawareness of it be the main issue?

@AlexTalker
Copy link

@ctalledo Should I mentioned earlier that I use userns to resolve the issue with CIFS, so my docker.conf is following:

{
    "bip": "172.17.0.1/16",
    "default-address-pools": [
        {
            "base": "172.31.0.0/16",
            "size": 24
        }
    ],
    "storage-driver": "overlay2",
    "runtimes": {
        "sysbox-runc": {
            "path": "/usr/local/sbin/sysbox-runc"
        }
    },
    "userns-remap": "sysbox"
}

But since I think you ran your tests without it, it probably does not matter much.

@AlexTalker
Copy link

AlexTalker commented Apr 26, 2021

I did a little bit of research. The syscall that brings up the issue came is implemented in kernel 3.8, to which I update(via Ubuntu HWE version) - okay.
But that did not solved the problem, so I googled around a bit.

Turns out, Docker already has workaround for it and likely I have version installed which includes these half-year-ago fixes: moby/moby#41353

But then I looked up your documentation and it states that you do not allow seccomp profiles pass-by, thus I feel helpless to apply the same workaround.

I am really eager for this problem to get fixed tho.

@AlexTalker
Copy link

@ctalledo I made a simple PR, would you kindly test if it works?

I think compatibility is a major issue unless you somehow embed libseccomp on the spot(about 2.5.0 is sufficient, this one is available on Focal).
Tho I think this major issue is quite a motivation for you to support passing seccomp options,
you could implement some kind of validation that required ones is included or forcing them
but it is definitely better to have ability to pass the bloody file instead of round and round.

@ctalledo
Copy link
Member Author

@ctalledo I made a simple PR, would you kindly test if it works?

Thanks; yes will give this a shot later today.

@ctalledo
Copy link
Member Author

ctalledo commented Apr 29, 2021

Hi @AlexTalker, I got a chance to look into this issue.

I can confirm that adding the faccessat2() syscall to the Sysbox seccomp lib and sysbox-runc syscall list will fix the issue. I'll be creating a PR tomorrow, hopefully it will be committed by the weekend (after code-review).

I also noticed that even with the fix, deploying a sysbox container using the manjarolinux/base image caused systemd to generate a bunch of warnings. The reason is that several services it starts do not really apply to container environments. I had previously seen the same warnings when running a sysbox container using the archlinux base image a couple of months ago.

To overcome this, I will create a Dockerfile for a nestybox/manjarolinux-systemd image that is essentially the same as the the nestybox/archlinux-systemd Dockerfile, except it inherits FROM manajarolinux/base. This dockerfile basically removes some unneeded or unsupported systemd services inside the container.

Using a local image based on this nestybox/manjarolinux-systemd dockerfile, and with the fix, I can see things look good:

root@sysbox-test:~/nestybox/sysbox# docker run --runtime=sysbox-runc -it --rm nestybox/manjarolinux-systemd                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                                                              
Welcome to Manjaro Linux!                                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                                                              
[  OK  ] Created slice system-getty.slice.                                                                                                                                                                                                                                                                                    
[  OK  ] Created slice system-modprobe.slice.                                                                                                                                                                                                                                                                                 
[  OK  ] Created slice User and Session Slice.                                                                                                                                                                                                                                                                                
[  OK  ] Started Dispatch Password Requests to Console Directory Watch.                                                                                                                                                                                                                                                       
[  OK  ] Started Forward Password Requests to Wall Directory Watch.    
...
[  OK  ] Reached target Login Prompts.                                                                                                                                                                                                                                                                                        
[  OK  ] Reached target Multi-User System.                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                                                              
Manjaro Linux 5.4.0-71-generic  (b9fb54c16225) (console)                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                     
b9fb54c16225 login: admin                                                                                                                                                                                                                                                                                                     
Password:                                                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
[admin@b9fb54c16225 ~]$ cat /etc/os-release                                                                                                                                                                                                                                                                                   
NAME="Manjaro Linux"                                                                                                                                                                                                                                                                                                          
ID=manjaro                                                                                                                                                                                                                                                                                                                    
ID_LIKE=arch                                                                                                                                                                                                                                                                                                                  
BUILD_ID=rolling                                                                                                                                                                                                                                                                                                              
PRETTY_NAME="Manjaro Linux"                                                                                                                                                                                                                                                                                                   
ANSI_COLOR="32;1;24;144;200"                                                                                                                                                                                                                                                                                                  
HOME_URL="https://manjaro.org/"                                                                                                                                                                                                                                                                                               
DOCUMENTATION_URL="https://wiki.manjaro.org/"                                                                                                                                                                                                                                                                                 
SUPPORT_URL="https://manjaro.org/"                                                                                                                                                                                                                                                                                            
BUG_REPORT_URL="https://bugs.manjaro.org/"                                                                                                                                                                                                                                                                                    
LOGO=manjarolinux                             

@ctalledo
Copy link
Member Author

The following PRs have the required fix:

nestybox/sysbox-runc#36
nestybox/libseccomp#5

And this additional PR has a couple of reference Dockerfiles to run manjaro linux inside sysbox containers:

nestybox/dockerfiles#12

@ctalledo
Copy link
Member Author

PRs have been merged, closing this issue. Please re-open if you see any problems.

@ctalledo ctalledo self-assigned this Apr 29, 2021
@ctalledo ctalledo added the bug Something isn't working label Apr 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants