Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCI runtime exec failed: exec failed: unable to start container process: open /dev/pts/0: operation not permitted: unknown #3551

Closed
amigthea opened this issue Aug 5, 2022 · 3 comments · Fixed by #3559 or kubernetes-sigs/kubespray#9230
Milestone

Comments

@amigthea
Copy link

amigthea commented Aug 5, 2022

As title, this error is generated by a breaking bug found on runc version 1.1.3

@kolyshkin
Copy link
Contributor

I wonder if this is related to #3504?

@kolyshkin
Copy link
Contributor

I wonder if this is related to #3504?

OK, looks like it is.

I've added the following debug patch to the tip of release-1.1 branch:

diff --git a/libcontainer/cgroups/systemd/common.go b/libcontainer/cgroups/systemd/common.go
index 5a68a3cf..b153ec51 100644
--- a/libcontainer/cgroups/systemd/common.go
+++ b/libcontainer/cgroups/systemd/common.go
@@ -295,6 +295,8 @@ func generateDeviceProperties(r *configs.Resources) ([]systemdDbus.Property, err
                // have a corresponding path.
                if _, err := os.Stat(entry.Path); err == nil {
                        deviceAllowList = append(deviceAllowList, entry)
+               } else {
+                       logrus.Warnf("Skipping device %+v: %w", entry, err)
                }
        }
 

and got the following output (from one of the tests -- doesn't matter):

   runc run -d --console-socket /tmp/bats-run-JcLxzP/runc.7sdPPv/tty/sock test_busybox (status=0):
   time="2022-08-10T16:47:33-07:00" level=warning msg="Skipping device {Path:block-* Perms:m}: %!w(*fs.PathError=&{stat block-* 2})"
   time="2022-08-10T16:47:33-07:00" level=warning msg="Skipping device {Path:char-* Perms:m}: %!w(*fs.PathError=&{stat char-* 2})"
   time="2022-08-10T16:47:33-07:00" level=warning msg="Skipping device {Path:char-pts Perms:rwm}: %!w(*fs.PathError=&{stat char-pts 2})"

So, what happens, I guess, is:

  1. With [1.1] cgroups: systemd: skip adding device paths that don't exist #3504 in place, we are not longer adding char-pts rwm rule to the container's DeviceAllow systemd unit property.
  2. The fact that the rule is excluded doesn't matter, since we also apply the rules directly (for cgroup v1, by writing to devices.allow)
  3. systemctl daemon-reload reapplies the DeviceAllow rules, thus removing the char-pts rule.

Fix is coming.

kolyshkin added a commit to kolyshkin/runc that referenced this issue Aug 11, 2022
A regression reported for runc v1.1.3 says that after systemctl
daemon-reload runc exec fails:

> exec failed: unable to start container process: open /dev/pts/0: operation not permitted: unknown

Apparently, with commit 7219387 we are no longer adding
"DeviceAllow=char-pts rwm" rule (because os.Stat("char-pts") returns
ENOENT).

The bug can only be seen after "systemctl daemon-reload" because runc
also applies the same rules manually (by writing to devices.allow for
cgroup v1), and apparently reloading systemd leads to re-applying the
rules that systemd has (thus removing the char-pts access).

The fix is to do os.Stat only for "/dev" paths.

Also, emit a warning that the path was skipped. Since the original idea
was to emit less warnings, demote the level to debug.

Fixes: opencontainers#3551
Fixes: 7219387
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this issue Aug 11, 2022
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this issue Aug 11, 2022
A regression reported for runc v1.1.3 says that after systemctl
daemon-reload runc exec fails:

> exec failed: unable to start container process: open /dev/pts/0: operation not permitted: unknown

Apparently, with commit 7219387 we are no longer adding
"DeviceAllow=char-pts rwm" rule (because os.Stat("char-pts") returns
ENOENT).

The bug can only be seen after "systemctl daemon-reload" because runc
also applies the same rules manually (by writing to devices.allow for
cgroup v1), and apparently reloading systemd leads to re-applying the
rules that systemd has (thus removing the char-pts access).

The fix is to do os.Stat only for "/dev" paths.

Also, emit a warning that the path was skipped. Since the original idea
was to emit less warnings, demote the level to debug.

Fixes: opencontainers#3551
Fixes: 7219387
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this issue Aug 11, 2022
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin
Copy link
Contributor

Trying to make a test for this in #3555, alas it's not failing.

kolyshkin added a commit to kolyshkin/runc that referenced this issue Aug 16, 2022
v2: use daemon-reexec

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this issue Aug 16, 2022
v2: use daemon-reexec
v3: require root to reload systemd

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this issue Aug 16, 2022
A regression reported for runc v1.1.3 says that after systemctl
daemon-reload runc exec fails:

> exec failed: unable to start container process: open /dev/pts/0: operation not permitted: unknown

Apparently, with commit 7219387 we are no longer adding
"DeviceAllow=char-pts rwm" rule (because os.Stat("char-pts") returns
ENOENT).

The bug can only be seen after "systemctl daemon-reload" because runc
also applies the same rules manually (by writing to devices.allow for
cgroup v1), and apparently reloading systemd leads to re-applying the
rules that systemd has (thus removing the char-pts access).

The fix is to do os.Stat only for "/dev" paths.

Also, emit a warning that the path was skipped. Since the original idea
was to emit less warnings, demote the level to debug.

Fixes: opencontainers#3551
Fixes: 7219387
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this issue Aug 16, 2022
A regression reported for runc v1.1.3 says that after systemctl
daemon-reload runc exec fails:

> exec failed: unable to start container process: open /dev/pts/0: operation not permitted: unknown

Apparently, with commit 7219387 we are no longer adding
"DeviceAllow=char-pts rwm" rule (because os.Stat("char-pts") returns
ENOENT).

The bug can only be seen after "systemctl daemon-reload" because runc
also applies the same rules manually (by writing to devices.allow for
cgroup v1), and apparently reloading systemd leads to re-applying the
rules that systemd has (thus removing the char-pts access).

The fix is to do os.Stat only for "/dev" paths.

Also, emit a warning that the path was skipped. Since the original idea
was to emit less warnings, demote the level to debug.

Fixes: opencontainers#3551
Fixes: 7219387
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this issue Aug 17, 2022
v2: use daemon-reexec
v3: require root to reload systemd
v4: use exec -t

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this issue Aug 17, 2022
v2: use daemon-reexec
v3: require root to reload systemd
v4: use exec -t

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this issue Aug 17, 2022
v2: use daemon-reexec
v3: require root to reload systemd
v4: use exec -t

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this issue Aug 17, 2022
A regression reported for runc v1.1.3 says that "runc exec -t" fails
after doing "systemctl daemon-reload":

> exec failed: unable to start container process: open /dev/pts/0: operation not permitted: unknown

Apparently, with commit 7219387 we are no longer adding
"DeviceAllow=char-pts rwm" rule (because os.Stat("char-pts") returns
ENOENT).

The bug can only be seen after "systemctl daemon-reload" because runc
also applies the same rules manually (by writing to devices.allow for
cgroup v1), and apparently reloading systemd leads to re-applying the
rules that systemd has (thus removing the char-pts access).

The fix is to do os.Stat only for "/dev" paths.

Also, emit a warning that the path was skipped. Since the original idea
was to emit less warnings, demote the level to debug.

Note this also fixes the issue of not adding "m" permission for block-*
and char-* devices.

A test case is added, which reliably fails before the fix
on both cgroup v1 and v2.

Fixes: opencontainers#3551
Fixes: 7219387
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this issue Aug 17, 2022
A regression reported for runc v1.1.3 says that "runc exec -t" fails
after doing "systemctl daemon-reload":

> exec failed: unable to start container process: open /dev/pts/0: operation not permitted: unknown

Apparently, with commit 7219387 we are no longer adding
"DeviceAllow=char-pts rwm" rule (because os.Stat("char-pts") returns
ENOENT).

The bug can only be seen after "systemctl daemon-reload" because runc
also applies the same rules manually (by writing to devices.allow for
cgroup v1), and apparently reloading systemd leads to re-applying the
rules that systemd has (thus removing the char-pts access).

The fix is to do os.Stat only for "/dev" paths.

Also, emit a warning that the path was skipped. Since the original idea
was to emit less warnings, demote the level to debug.

Note this also fixes the issue of not adding "m" permission for block-*
and char-* devices.

A test case is added, which reliably fails before the fix
on both cgroup v1 and v2.

Fixes: opencontainers#3551
Fixes: 7219387
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this issue Aug 18, 2022
A regression reported for runc v1.1.3 says that "runc exec -t" fails
after doing "systemctl daemon-reload":

> exec failed: unable to start container process: open /dev/pts/0: operation not permitted: unknown

Apparently, with commit 7219387 we are no longer adding
"DeviceAllow=char-pts rwm" rule (because os.Stat("char-pts") returns
ENOENT).

The bug can only be seen after "systemctl daemon-reload" because runc
also applies the same rules manually (by writing to devices.allow for
cgroup v1), and apparently reloading systemd leads to re-applying the
rules that systemd has (thus removing the char-pts access).

The fix is to do os.Stat only for "/dev" paths.

Also, emit a warning that the path was skipped. Since the original idea
was to emit less warnings, demote the level to debug.

Note this also fixes the issue of not adding "m" permission for block-*
and char-* devices.

A test case is added, which reliably fails before the fix
on both cgroup v1 and v2.

Fixes: opencontainers#3551
Fixes: 7219387
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this issue Aug 18, 2022
A regression reported for runc v1.1.3 says that "runc exec -t" fails
after doing "systemctl daemon-reload":

> exec failed: unable to start container process: open /dev/pts/0: operation not permitted: unknown

Apparently, with commit 7219387 we are no longer adding
"DeviceAllow=char-pts rwm" rule (because os.Stat("char-pts") returns
ENOENT).

The bug can only be seen after "systemctl daemon-reload" because runc
also applies the same rules manually (by writing to devices.allow for
cgroup v1), and apparently reloading systemd leads to re-applying the
rules that systemd has (thus removing the char-pts access).

The fix is to do os.Stat only for "/dev" paths.

Also, emit a warning that the path was skipped. Since the original idea
was to emit less warnings, demote the level to debug.

Note this also fixes the issue of not adding "m" permission for block-*
and char-* devices.

A test case is added, which reliably fails before the fix
on both cgroup v1 and v2.

Fixes: opencontainers#3551
Fixes: 7219387
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this issue Aug 18, 2022
A regression reported for runc v1.1.3 says that "runc exec -t" fails
after doing "systemctl daemon-reload":

> exec failed: unable to start container process: open /dev/pts/0: operation not permitted: unknown

Apparently, with commit 7219387 we are no longer adding
"DeviceAllow=char-pts rwm" rule (because os.Stat("char-pts") returns
ENOENT).

The bug can only be seen after "systemctl daemon-reload" because runc
also applies the same rules manually (by writing to devices.allow for
cgroup v1), and apparently reloading systemd leads to re-applying the
rules that systemd has (thus removing the char-pts access).

The fix is to do os.Stat only for "/dev" paths.

Also, emit a warning that the path was skipped. Since the original idea
was to emit less warnings, demote the level to debug.

Note this also fixes the issue of not adding "m" permission for block-*
and char-* devices.

A test case is added, which reliably fails before the fix
on both cgroup v1 and v2.

This is a backport of commit 58b1374
to release-1.1 branch.

Fixes: opencontainers#3551
Fixes: 7219387
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
stefanberger pushed a commit to stefanberger/runc that referenced this issue Feb 7, 2023
A regression reported for runc v1.1.3 says that "runc exec -t" fails
after doing "systemctl daemon-reload":

> exec failed: unable to start container process: open /dev/pts/0: operation not permitted: unknown

Apparently, with commit 7219387 we are no longer adding
"DeviceAllow=char-pts rwm" rule (because os.Stat("char-pts") returns
ENOENT).

The bug can only be seen after "systemctl daemon-reload" because runc
also applies the same rules manually (by writing to devices.allow for
cgroup v1), and apparently reloading systemd leads to re-applying the
rules that systemd has (thus removing the char-pts access).

The fix is to do os.Stat only for "/dev" paths.

Also, emit a warning that the path was skipped. Since the original idea
was to emit less warnings, demote the level to debug.

Note this also fixes the issue of not adding "m" permission for block-*
and char-* devices.

A test case is added, which reliably fails before the fix
on both cgroup v1 and v2.

This is a backport of commit 58b1374
to release-1.1 branch.

Fixes: opencontainers#3551
Fixes: 7219387
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants