Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

state's status detection (based on PID lookups) is imprecise #871

Closed
wking opened this issue Jun 3, 2016 · 4 comments
Closed

state's status detection (based on PID lookups) is imprecise #871

wking opened this issue Jun 3, 2016 · 4 comments

Comments

@wking
Copy link
Contributor

wking commented Jun 3, 2016

Since a few days ago, the runtime spec has a status property in the state (opencontainers/runtime-spec#462). I was curious about how that worked with detached containers, since there's no host-side monitor around to notice the container die. Testing with 1d61abe (the current tip of #827, although I expect this applies to all recent runC code), it looks like the state call works around the lack of a monitor (who could waitid(2) or some such on the container process) by looking up the container PID in /proc dynamically for each state call. That's racy though, since “has the same PID in /proc” doesn't mean “is the same process”. For example:

$ ./runc spec
$ JSON=$(jq 'del(.process.terminal)' config.json)
$ echo "${JSON}" >config.json
$ mkdir -p rootfs/bin
$ cp $(command -v busybox) rootfs/bin/sh
$ STATE=$(sudo ./runc state abc)
$ echo "${STATE}"
{
  "ociVersion": "0.6.0-dev",
  "id": "abc",
  "pid": 26373,
  "bundlePath": "/…/runc",
  "rootfsPath": "/…/runc/rootfs",
  "status": "created",
  "created": "2016-06-03T03:25:28.641226726Z"
}
$ PID=$(echo "${STATE}" | jq -r .pid)
$ sudo kill -9 "${PID}"
$ sudo ./runc state abc
{
  "ociVersion": "0.6.0-dev",
  "id": "abc",
  "pid": 26373,
  "bundlePath": "/…/runc",
  "rootfsPath": "/…/runc/rootfs",
  "status": "stopped",
  "created": "2016-06-03T03:25:28.641226726Z"
}

In a separate terminal, spawn processes until we match the old container PID:

$ while true; do sh -c 'if test $$ -eq 26373; then echo muahaha; sleep 60; fi'; done
muahaha
^C

And then runC thinks (wrongly) that the container is alive again:

$ sudo ./runc state abc
Password:
{
  "ociVersion": "0.6.0-dev",
  "id": "abc",
  "pid": 26373,
  "bundlePath": "/…/runc",
  "rootfsPath": "/…/runc/rootfs",
  "status": "running",
  "created": "2016-06-03T03:25:28.641226726Z"
}

One solution is to have the parent process (or an ancestor with PR_SET_CHILD_SUBREAPER after the parent dies) waitid(2) on the container process.

Another solution is to store more information about the container process (we probably already do) and compare that additional data when deciding whether the process running with the container-process PID was still the container process. For example, the PID and start-time (field 22 in /proc/[pid]/stat) form a tuple that is almost certainly unique.

@cyphar
Copy link
Member

cyphar commented Jun 3, 2016

One clever way of doing it is to stat the /proc/<pid> directory -- which will give you a (as far as I'm aware) unique identifier for the process. But I'll have to read the relevant kernel code to make sure that would actually work. :P

@wking
Copy link
Contributor Author

wking commented Jun 13, 2016

On Mon, Jun 13, 2016 at 12:36:40PM -0700, Alexander Morozov wrote:

Closed #871 via #886.

#886 means that the imprecise check isn't a concern for ‘start’
invocations. But it didn't change the runType code at all. Since the
result of runType is still being exposed via the state JSON, I think
we want to reopen this issue until a more reliable implementation
lands (e.g. along the lines of 1).

@hqhq
Copy link
Contributor

hqhq commented Jun 29, 2016

@wking Yeah I think it's still a valid issue, I'll reopen it.

@hqhq hqhq reopened this Jun 29, 2016
hqhq added a commit to hqhq/runc that referenced this issue Jun 30, 2016
Fixes: opencontainers#871

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
@hqhq
Copy link
Contributor

hqhq commented Jun 30, 2016

@wking I opened a PR #930 to fix this.

hqhq added a commit to hqhq/runc that referenced this issue Jul 4, 2016
Fixes: opencontainers#871

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
hqhq added a commit to hqhq/runc that referenced this issue Jul 5, 2016
Fixes: opencontainers#871

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
stefanberger pushed a commit to stefanberger/runc that referenced this issue Sep 8, 2017
config-linux: "may" -> "MAY" for supplying linux.devices
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants