Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot run apt-get update in rootless container with sufficient capabilities #2517

Closed
haampie opened this issue Jul 11, 2020 · 21 comments
Closed

Comments

@haampie
Copy link

haampie commented Jul 11, 2020

Similar to #1860. I can't seem to get a rootless ubuntu:18.04 container to run apt-get update, even with CAP_SYS_ADMIN.

$ mkdir rootfs
$ docker export $(docker create ubuntu:18.04) | tar -C rootfs -xf -
$ runc --version
runc version 1.0.0-rc91
spec: 1.0.2-dev
$ cat config.json
{
	"ociVersion": "1.0.2-dev",
	"process": {
		"terminal": false,
		"user": {
			"uid": 0,
			"gid": 0
		},
		"args": [
			"apt-get",
			"update"
		],
		"env": [
			"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
		],
		"cwd": "/",
		"capabilities": {
			"bounding": [
				"CAP_SYS_ADMIN"
			],
			"effective": [
				"CAP_SYS_ADMIN"
			],
			"inheritable": [
				"CAP_SYS_ADMIN"
			],
			"permitted": [
				"CAP_SYS_ADMIN"
			],
			"ambient": [
				"CAP_SYS_ADMIN"
			]
		},
		"rlimits": [
			{
				"type": "RLIMIT_NOFILE",
				"hard": 1024,
				"soft": 1024
			}
		],
		"noNewPrivileges": true
	},
	"root": {
		"path": "rootfs",
		"readonly": false
	},
	"hostname": "runc",
	"mounts": [
		{
			"destination": "/proc",
			"type": "proc",
			"source": "proc"
		},
		{
			"destination": "/dev",
			"type": "tmpfs",
			"source": "tmpfs",
			"options": [
				"nosuid",
				"strictatime",
				"mode=755",
				"size=65536k"
			]
		},
		{
			"destination": "/dev/pts",
			"type": "devpts",
			"source": "devpts",
			"options": [
				"nosuid",
				"noexec",
				"newinstance",
				"ptmxmode=0666",
				"mode=0620"
			]
		},
		{
			"destination": "/dev/shm",
			"type": "tmpfs",
			"source": "shm",
			"options": [
				"nosuid",
				"noexec",
				"nodev",
				"mode=1777",
				"size=65536k"
			]
		},
		{
			"destination": "/dev/mqueue",
			"type": "mqueue",
			"source": "mqueue",
			"options": [
				"nosuid",
				"noexec",
				"nodev"
			]
		},
		{
			"destination": "/sys",
			"type": "none",
			"source": "/sys",
			"options": [
				"rbind",
				"nosuid",
				"noexec",
				"nodev",
				"ro"
			]
		}
	],
	"linux": {
		"uidMappings": [
			{
				"containerID": 0,
				"hostID": 1000,
				"size": 1
			}
		],
		"gidMappings": [
			{
				"containerID": 0,
				"hostID": 1000,
				"size": 1
			}
		],
		"namespaces": [
			{
				"type": "pid"
			},
			{
				"type": "ipc"
			},
			{
				"type": "uts"
			},
			{
				"type": "mount"
			},
			{
				"type": "user"
			}
		],
		"maskedPaths": [
			"/proc/acpi",
			"/proc/asound",
			"/proc/kcore",
			"/proc/keys",
			"/proc/latency_stats",
			"/proc/timer_list",
			"/proc/timer_stats",
			"/proc/sched_debug",
			"/sys/firmware",
			"/proc/scsi"
		],
		"readonlyPaths": [
			"/proc/bus",
			"/proc/fs",
			"/proc/irq",
			"/proc/sys",
			"/proc/sysrq-trigger"
		]
	}
}

The error I'm getting is:

$ runc run example
E: setgroups 65534 failed - setgroups (1: Operation not permitted)
E: setegid 65534 failed - setegid (22: Invalid argument)
E: seteuid 100 failed - seteuid (22: Invalid argument)
E: setgroups 0 failed - setgroups (1: Operation not permitted)
Reading package lists...
W: chown to _apt:root of directory /var/lib/apt/lists/partial failed - SetupAPTPartialDirectory (22: Invalid argument)
W: chown to _apt:root of directory /var/lib/apt/lists/auxfiles failed - SetupAPTPartialDirectory (22: Invalid argument)
E: setgroups 65534 failed - setgroups (1: Operation not permitted)
E: setegid 65534 failed - setegid (22: Invalid argument)
E: seteuid 100 failed - seteuid (22: Invalid argument)
E: setgroups 0 failed - setgroups (1: Operation not permitted)
E: Method gave invalid 400 URI Failure message: Failed to setgroups - setgroups (1: Operation not permitted)
E: Method http has died unexpectedly!
E: Sub-process http returned an error code (112)

With rootless docker I am able to actually run apt-get update. Any clue what's going wrong here?

@haampie
Copy link
Author

haampie commented Jul 11, 2020

Same happens with the whole list:

"CAP_AUDIT_WRITE",
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FOWNER",
"CAP_FSETID",
"CAP_KILL",
"CAP_MKNOD",
"CAP_NET_BIND_SERVICE",
"CAP_NET_RAW",
"CAP_SETFCAP",
"CAP_SETGID",
"CAP_SETPCAP",
"CAP_SETUID",
"CAP_SYS_ADMIN",
"CAP_SYS_CHROOT"

@haampie
Copy link
Author

haampie commented Jul 11, 2020

Compare runc

$ runc run hi
root@runc:/# cat /proc/self/setgroups  
deny

with rootless docker:

$ docker run --rm --cap-drop all --cap-add CAP_SETGID ubuntu:18.04 cat /proc/self/setgroups
allow

@haampie haampie closed this as completed Jul 11, 2020
@haampie haampie reopened this Jul 11, 2020
@haampie
Copy link
Author

haampie commented Jul 11, 2020

Ah, I've figured it out. You need at least two entries in both uid and gid mappings. For anyone struggling with this, just check out how rootless docker is configured. E.g.:

runc $ docker run --rm --cap-drop all --cap-add CAP_SETGID ubuntu:18.04 cat /proc/self/gid_map
         0       1000          1
         1     100000      65536
runc $ docker run --rm --cap-drop all --cap-add CAP_SETGID ubuntu:18.04 cat /proc/self/uid_map
         0       1000          1
         1     100000      65536

means adding

		"uidMappings": [
			{
				"containerID": 0,
				"hostID": 1000,
				"size": 1
			},
			{
				"containerID": 1,
				"hostID": 100000,
				"size": 65536
			}
		],
		"gidMappings": [
			{
				"containerID": 0,
				"hostID": 1000,
				"size": 1
			},
			{
				"containerID": 1,
				"hostID": 100000,
				"size": 65536
			}
		],

this solves the problems. Also please don't copy and paste all those linux capabilities I've posted above, that's just for testing purposes.

Those mapping seem to come from /etc/subuid and /etc/subgid.

@ashwani29
Copy link

In my case it shows this:

docker run --rm --cap-drop all --cap-add CAP_SETGID ubuntu cat /proc/self/uid_map
         0          0 4294967295

same with gid_map.
what does this mean and what should i add now?

Ah, I've figured it out. You need at least two entries in both uid and gid mappings. For anyone struggling with this, just check out how rootless docker is configured. E.g.:

runc $ docker run --rm --cap-drop all --cap-add CAP_SETGID ubuntu:18.04 cat /proc/self/gid_map
         0       1000          1
         1     100000      65536
runc $ docker run --rm --cap-drop all --cap-add CAP_SETGID ubuntu:18.04 cat /proc/self/uid_map
         0       1000          1
         1     100000      65536

means adding

		"uidMappings": [
			{
				"containerID": 0,
				"hostID": 1000,
				"size": 1
			},
			{
				"containerID": 1,
				"hostID": 100000,
				"size": 65536
			}
		],
		"gidMappings": [
			{
				"containerID": 0,
				"hostID": 1000,
				"size": 1
			},
			{
				"containerID": 1,
				"hostID": 100000,
				"size": 65536
			}
		],

this solves the problems. Also please don't copy and paste all those linux capabilities I've posted above, that's just for testing purposes.

Those mapping seem to come from /etc/subuid and /etc/subgid.

@rijenkii
Copy link

rijenkii commented Nov 3, 2020

@ashwani29 you need to add the following into your systems /etc/subuid and /etc/subgid

username:100000:65536

Replace username with your system username

Then add "uidMappings" and "gidMappings" from here into your config.json

To make apt work I also needed to add these capabilities into "bounding", "effective", "inheritable", "permitted" and "ambient"

@ashwani29
Copy link

Thanks @rijenkii will surely try this as I wasn't able to do this yet, just one query do you know about runc.conf? I needed help in this.

@rijenkii
Copy link

rijenkii commented Nov 3, 2020

Sorry, can't help you with that

@ashwani29
Copy link

Ok no worry, thanks for replying.

@rofl0r
Copy link

rofl0r commented Jan 21, 2021

it seems the developers of apt went to great lengths to make it hard to use their software outside the context they envisioned. we can find this source code in apt-pkg/contrib/fileutl.cc (which i found after receiving those error messages after creating a ptrace hook to return fake success from SYS_chown):

   // enabled by default as all fakeroot-lookalikes should fake that accordingly
   if (VerifySandboxing == true || _config->FindB("APT::Sandbox::Verify::IDs", true) == true)
   {
      // Verify that gid, egid, uid, and euid changed
      if (getgid() != pw->pw_gid)
         return _error->Error("Could not switch group");
      if (getegid() != pw->pw_gid)
         return _error->Error("Could not switch effective group");
      if (getuid() != pw->pw_uid)
         return _error->Error("Could not switch user");
      if (geteuid() != pw->pw_uid)
         return _error->Error("Could not switch effective user");

i.e. the software refuses to work if you don't use exactly the sandboxing mechanism they invented.

the good news is that we can get it to work by simply removing the line with the username _apt from /etc/passwd, and suddenly we can use apt update even with the most simple unshare sandbox as regular user. there seems to be also a config option "APT::Sandbox::Verify" but i don't know how/where one can set it.

i'm commenting here in the hope other users getting those errors messages will find my solution, rather than having to mess around with capabilities to get it to work and generally wasting a lot of time.

@ashwani29
Copy link

@rofl0r hi, thanks for this but it still requires sudo to download any package in rootfs

# apt update
Reading package lists... Done
E: List directory /var/lib/apt/lists/partial is missing. - Acquire (13: Permission denied)

and using sudo:

# sudo apt install python3
sh: 7: sudo: not found
# 

@haampie
Copy link
Author

haampie commented Jan 21, 2021

Do we know of distro's that are not so incredibly limiting w.r.t. the package manager?

I've had more luck with alpine's apk, it still issues many warnings about ownership, but at least they are not fatal errors. The only issue is alpine is musl-based, and I'd rather have a glibc-based distro.

@rofl0r
Copy link

rofl0r commented Jan 21, 2021

@ashwani29

E: List directory /var/lib/apt/lists/partial is missing. - Acquire (13: Permission denied)

maybe there's a file in your rootfs you really can't access. i didn't get this. try to run strace apt update to find out where it's happening and fix the permission problem (look for EPERM in strace's output - and yes, you need to use a statically linked strace binary from elsewhere as you can't use apt yet to install it...)

Do we know of distro's that are not so incredibly limiting w.r.t. the package manager?

yes, for example the one of my own distro sabotage linux. i've been using this 20 line C program since forever to build the entire OS inside a chroot as a regular user. the right thing to do is to fail when a critical resource can't be accessed, not doing some bogus user id checks beforehand and refusing to operate if those fail.

edit: i've written a short guide how to chroot as regular user to a minimal ubuntu rootfs: https://github.com/sabotage-linux/sabotage/wiki/Running-a-minimal-ubuntu-rootfs-as-regular-user

@cyphar
Copy link
Member

cyphar commented Feb 1, 2021

Do we know of distro's that are not so incredibly limiting w.r.t. the package manager?

openSUSE's zypper doesn't have this issue (so any RPM-based distro should probably work fine as well).

But yes, the apt issue mentioned by @rofl0r is pretty well-known -- in fact I mention it in my original talk about rootless containers in 2017. The current solution (aside from switching distributions) is to use something like proot which makes those syscalls no-ops and fakes the results from getuid -- in future we will switch to using SECCOMP_RET_USER_NOTIF syscall emulation to avoid breaking ptrace(2) but that's a separate topic. You also need to fake file ownership if you're using a single-mapping rootless container (and we have code for that too in proot).

Maybe it should be added to rootlesskit but I'm not sure if that's the right place to host the fix.

@rofl0r
Copy link

rofl0r commented Feb 1, 2021

The current solution (aside from switching distributions) is to use something like proot which makes those syscalls no-ops and fakes the results from getuid

how's that a better solution than fixing apt by removing one line in /etc/passwd (or even better contact the apt maintainers so they fix their stuff)? in case you didn't notice, proot (ptrace in general) has a quite hefty price. i recently wrote an app using my debuglib and ran my whole desktop under ptrace, and some apps became almost unusably slow. maybe the overhead was amplified due to 30+ processes being traced and serialized, but there will always be an overhead. additionally proot uses the pre-PTRACE_SEIZE ptrace api that can't properly deal with SIGSTOP/SIGCONT, and some programs using glib uses those and misbehave when SIGSTOP doesn't really stop the process.

@cyphar
Copy link
Member

cyphar commented Feb 1, 2021

It's not a better solution, but it is a solution that solves the problem without needing to patch every program that acts suboptimally inside user namespaces (apt is not the only example, it's just one that most people run into first). There's no reason we can have a solution which solves the problem for most programs and then we go and patch the world separately (if you'd like to send patches to the Debian folks to remove their sandbox checks, be my guest).

As for overhead, SECCOMP_RET_USER_NOTIF (or even SECCOMP_RET_TRACE) reduce the overhead significantly. The reason why ptrace(2) has such massive overhead is that PTRACE_SYSCALL stops the process on every syscall even if you don't care about those syscalls -- but if you use seccomp filtering you can reduce it to only pinging the tracer for the specific syscalls that need to be emulated. And for the PTRACE_SEIZE thing (as well as many other issues with ptrace you didn't mention like running gdb or upstart inside the container, signals being broken in many other ways, wait acting weirdly under ptrace), SECCOMP_RET_USER_NOTIF solves basically of the issues with ptrace(2) when it comes to syscall emulation (unlike ptrace(2), seccomp doesn't screw with the way signals are handled) so switching to that would solve those issues as well.

@rofl0r
Copy link

rofl0r commented Feb 1, 2021

apt is not the only example, it's just one that most people run into first

so far i've seen only two: apt and GNU tar which insists on chown()ing files when run as root (without --no-same-owner, which is impractical to add to every invocation), and returning a failure exit status when that fails. my ubuntu guide linked two comments earlier has a section how that can be easily binary-patched out, but i suppose GNU guys could be convinced that they turn the hard error into a warning and return success anyway.

i don't really see why there's such a reluctance in the container scene to cooperate with the few misbehaving program's authors instead of trying to find ever new low-level solutions for the problem. (ftr, i did the first distro based on musl libc and had over 100 misbehaving programs patched).

if you'd like to send patches to the Debian folks to remove their sandbox checks, be my guest

i actually hoped i could convince you to do it, since the request coming from a developer of a well-known container solution would look a lot more legitimate than from some random distro dude.

@cyphar
Copy link
Member

cyphar commented Feb 1, 2021

I actually hoped i could convince you to do it, since the request coming from a developer of a well-known container solution would look a lot more legitimate than from some random distro dude.

I mean, I can do it -- though there are far less combative ways to ask me to write and send patches to another project.

I don't really see why there's such a reluctance in the container scene to cooperate with the few misbehaving program's authors instead of trying to find ever new low-level solutions for the problem.

It's not reluctance to co-operate with other projects (speaking for myself, I contribute to many other projects). There are a couple of reasons why having workaround solutions in place is a good thing:

  • Working with upstream for every issue we run into doesn't make it easier for users who are trying to just get something working in a container -- having a best-effort solution (such as through syscall emulation) solves the problem for most people without blocking everything on us getting code in many different upstreams.
  • It's possible any one upstream will disagree with our view that users should be able to run code inside a rootless container, at which point we would need a workaround solution anyway.
  • People with older images won't get the benefit of any patches we push upstream because some distributions update on timescales rivaling continental drift.
  • Users may need to run proprietary software which they cannot patch, and having workarounds for those projects is also useful.

None of this means "we shouldn't fix upstream projects" (far from it), it's just that having a workaround solution is clearly better than not having one and I don't see why you feel so strongly that having a workaround solution is somehow negative. The alternative is that users have no way of working around a project that doesn't work in rootless containers.

(Not to mention that for some more complicated cases -- such as faking mknod(2) -- tools like SECCOMP_RET_USER_NOTIF are the only way of getting programs to run in containers nicely. So these techniques are useful in general.)

@mtmk
Copy link

mtmk commented Feb 6, 2022

This is how I got around it (thanks to @rofl0r commenting on apt internals above)

apt-config dump | grep Sandbox::User
cat <<EOF > /etc/apt/apt.conf.d/sandbox-disable
APT::Sandbox::User "root";
EOF

@haampie
Copy link
Author

haampie commented Feb 7, 2022

FWIW I've also had success with unshare -r inside of the container in container runtimes that don't make you root user

@rofl0r
Copy link

rofl0r commented Apr 6, 2022

@mtmk good find! i used your approach and refined it a little bit more in my guide
it also deals with the above mentioned APT::Sandbox::Verify and makes dpkg ignore chown errors.

@commonism
Copy link

Only happens when systemd-sysv is installed in the container.

More complete workaround

cat <<EOF>/etc/apt/apt.conf.d/99-disable-sandbox
APT::Sandbox::User "root";
APT::Sandbox::Verify "0";
APT::Sandbox::Verify::IDs "0";
APT::Sandbox::Verify::Groups "0";
APT::Sandbox::Verify::Regain "0";
EOF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants