Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix failed exec after systemctl daemon-reload #3559

Merged
merged 1 commit into from
Aug 18, 2022

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Aug 16, 2022

This is a forward port of #3554 to the main branch. Original description follows.


A regression reported for runc v1.1.3 says that after systemctl
daemon-reload runc exec fails:

exec failed: unable to start container process: open /dev/pts/0: operation not permitted: unknown

Apparently, with commit 7219387 we are no longer adding
"DeviceAllow=char-pts rwm" rule (because os.Stat("char-pts") returns
ENOENT).

The bug can only be seen after "systemctl daemon-reload" because runc
also applies the same rules manually (by writing to devices.allow for
cgroup v1), and apparently reloading systemd leads to re-applying the
rules that systemd has (thus removing the char-pts access).

The fix is to do os.Stat only for "/dev" paths.

Also, emit a warning that the path was skipped. Since the original idea
was to emit less warnings, demote the level to debug.

Note this also fixes the issue of not adding "m" permission for block-*
and char-* devices.

A test case is added, which reliably fails before the fix
on both cgroup v1 and v2 (for a test run before the fix, see #3555).

Fixes: #3551
Fixes: 7219387
Signed-off-by: Kir Kolyshkin kolyshkin@gmail.com

@AkihiroSuda
Copy link
Member

In the commit message could you explain how to test this?
Does this need cgroup v1 or SELinux for testing?

@AkihiroSuda
Copy link
Member

Is this still draft? I think we want to merge this main PR first and then cherry-pick

@thaJeztah
Copy link
Member

Looks like it was @kolyshkin's intention to merge the 1.1 fix first, then forward-port to main;

This is a forward port of #3554 to the main branch.

Have we done this before? I think generally we go from main -> release-branch to prevent the risk of regressions in main if we forget to forward-port, or have we done the other way round before?

@kolyshkin kolyshkin marked this pull request as ready for review August 17, 2022 21:50
@kolyshkin
Copy link
Contributor Author

Got some progress with the test case in #3555. Will add the test case here once it's working. Draft for now.

While originally this is a forward port (because the bug was reported against 1.1 and I initially fixed it in there), let's merge this one first, and then follow up with 1.1.

@kolyshkin kolyshkin marked this pull request as draft August 17, 2022 22:18
@kolyshkin kolyshkin added area/systemd area/ebpf backport/done/1.1 A PR in main branch which was backported to release-1.1 labels Aug 17, 2022
@kolyshkin
Copy link
Contributor Author

Draft pending test case addition.

@kolyshkin
Copy link
Contributor Author

Ready for review. The validate / codespell CI failure is unrelated and is being fixed by PR #3561

@kolyshkin kolyshkin marked this pull request as ready for review August 17, 2022 23:41
@kolyshkin
Copy link
Contributor Author

The test failure before the fix can be seen in PR #3555

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

do we want to ignore the code spell issue, or should we do a rebase? (probably that requires re-cherry-picking for the release branch though)

A regression reported for runc v1.1.3 says that "runc exec -t" fails
after doing "systemctl daemon-reload":

> exec failed: unable to start container process: open /dev/pts/0: operation not permitted: unknown

Apparently, with commit 7219387 we are no longer adding
"DeviceAllow=char-pts rwm" rule (because os.Stat("char-pts") returns
ENOENT).

The bug can only be seen after "systemctl daemon-reload" because runc
also applies the same rules manually (by writing to devices.allow for
cgroup v1), and apparently reloading systemd leads to re-applying the
rules that systemd has (thus removing the char-pts access).

The fix is to do os.Stat only for "/dev" paths.

Also, emit a warning that the path was skipped. Since the original idea
was to emit less warnings, demote the level to debug.

Note this also fixes the issue of not adding "m" permission for block-*
and char-* devices.

A test case is added, which reliably fails before the fix
on both cgroup v1 and v2.

Fixes: opencontainers#3551
Fixes: 7219387
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ebpf area/systemd backport/done/1.1 A PR in main branch which was backported to release-1.1
Projects
None yet
4 participants