Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set default cgroup path to /runc #895

Closed
wants to merge 1 commit into from

Conversation

brauner
Copy link
Contributor

@brauner brauner commented Jun 8, 2016

So far when "cgroupsPath" was not specified in the config.json file runC
containers used the prefix associated with the "devices" subsystem and placed
the container in a subhierarchy starting with prefix for all subystems. For
example, if the user starting the container was located in
"/sys/fs/cgroup/devices/user.slice/random-cgroup" then runC would retrieve the
"/user.slice/random-cgroup" string and place the container in a subcgroup
"/sys/fs/cgroup/*/user.slice/random-cgroup/CONTAINERNAME" for all subsystems.
Let's rather use "/runc" as the default cgroup for all subsystems. This seems
cleaner and falls in line with Docker/LXC et al.

Signed-off-by: Christian Brauner cbrauner@suse.com

So far when "cgroupsPath" was not specified in the config.json file runC
containers used the prefix associated with the "devices" subsystem and placed
the container in a subhierarchy starting with prefix for all subystems. For
example, if the user starting the container was located in
"/sys/fs/cgroup/devices/user.slice/random-cgroup" then runC would retrieve the
"/user.slice/random-cgroup" string and place the container in a subcgroup
"/sys/fs/cgroup/*/user.slice/random-cgroup/CONTAINERNAME" for all subsystems.
Let's rather use "/runc" as the default cgroup for all subsystems. This seems
cleaner and falls in line with Docker/LXC et al.

Signed-off-by: Christian Brauner <cbrauner@suse.com>
@cyphar
Copy link
Member

cyphar commented Jun 8, 2016

This is part of #892.

@cyphar cyphar added this to the 1.0.0 milestone Jun 8, 2016
@cyphar
Copy link
Member

cyphar commented Jun 8, 2016

WDYT @mrunalp @crosbymichael @hqhq ?

@mrunalp
Copy link
Contributor

mrunalp commented Jun 8, 2016

👍

@cyphar
Copy link
Member

cyphar commented Jun 10, 2016

I'm currently trying to codify some of the cgroupsPath semantics (especially with regard to non-absolute paths), and we should consider fixing that too. opencontainers/runtime-spec#493

@crosbymichael
Copy link
Member

No, I don't think that is correct. If you want a default cgroup path then you place it in the config. The reason why it nests itself in its parents cgroup path is to support things like being run under systemd. You can omit cgroup settings from the spec and place them in your service file. Then you just add runc as your command in the service file and the cgroups are correct and what you would expect.

If you have a hard coded default like this then when running under other supervisors and such it will cause unexpected things .

@brauner
Copy link
Contributor Author

brauner commented Jun 10, 2016

@crosbymichael, I don't understand what you mean by "it nests itself in its parents cgroup path". The problem I see is that it retrieves a random subcgroup from one cgroup, namely the devices subsystem, and takes that as the root cgroup for all other subsystems. That is if your are in the subcgroup user.slice in the devices subsystem as in

cat /proc/self/cgroup

11:hugetlb:/
10:blkio:/user.slice
9:net_cls,net_prio:/user.slice
8:cpuset:/
7:freezer:/
6:devices:/user.slice
5:pids:/user.slice/user-1000.slice/session-1.scope
4:memory:/user.slice
3:cpu,cpuacct:/user.slice
2:perf_event:/
1:name=systemd:/user.slice/user-1000.slice/session-1.scope

then the current implementation will lead to you being placed in:

cat /proc/self/cgroup

11:hugetlb:/user.slice/test
10:blkio:/user.slice/test
9:net_cls,net_prio:/user.slice/test
8:cpuset:/user.slice/test
7:freezer:/user.slice/test
6:devices:/user.slice/test
5:pids:/user.slice/test
4:memory:/user.slice/test
3:cpu,cpuacct:/user.slice/test
2:perf_event:/user.slice/test
1:name=systemd:/user.slice/test

which does not seem to be nesting in its parents cgroup path (e.g. the devices parent cgroup path is totally different from the pids parent cgroup path). Also, I'd argue that this is a rather arbitrary approach.
What this change is proposing is that if nothing is specified in your config than make the default layout to look like:

cat /proc/self/cgroup

11:hugetlb:/runc/test
10:blkio:/runc/test
9:net_cls,net_prio:/runc/test
8:cpuset:/runc/test
7:freezer:/runc/test
6:devices:/runc/test
5:pids:/runc/test
4:memory:/runc/test
3:cpu,cpuacct:/runc/test
2:perf_event:/runc/test
1:name=systemd:/runc/test

This is more in line with what you e.g. expect from Docker and I currently fail to see how this would be a problem for e.g. supervisord or other daemons that deal with cgroups. As long as init mounts cgroupf at /sys/fs/cgroup all is well. And if the init does mount it somewhere else you're hosed anyway since this seems to be a hard assumption of runC.

@crosbymichael
Copy link
Member

Docker wants its own sub path and it sets it, there is no need to makeup a new default.

11:perf_event:/docker/9ad5ba668fd8ce12b64c46f0f4a06a21b7336afc6189d339ee5c475d265c1852
10:cpuset:/docker/9ad5ba668fd8ce12b64c46f0f4a06a21b7336afc6189d339ee5c475d265c1852
9:hugetlb:/docker/9ad5ba668fd8ce12b64c46f0f4a06a21b7336afc6189d339ee5c475d265c1852
8:blkio:/docker/9ad5ba668fd8ce12b64c46f0f4a06a21b7336afc6189d339ee5c475d265c1852
7:cpu,cpuacct:/docker/9ad5ba668fd8ce12b64c46f0f4a06a21b7336afc6189d339ee5c475d265c1852
6:freezer:/docker/9ad5ba668fd8ce12b64c46f0f4a06a21b7336afc6189d339ee5c475d265c1852
5:pids:/docker/9ad5ba668fd8ce12b64c46f0f4a06a21b7336afc6189d339ee5c475d265c1852
4:devices:/docker/9ad5ba668fd8ce12b64c46f0f4a06a21b7336afc6189d339ee5c475d265c1852
3:memory:/docker/9ad5ba668fd8ce12b64c46f0f4a06a21b7336afc6189d339ee5c475d265c1852
2:net_cls,net_prio:/docker/9ad5ba668fd8ce12b64c46f0f4a06a21b7336afc6189d339ee5c475d265c1852
1:name=systemd:/docker/9ad5ba668fd8ce12b64c46f0f4a06a21b7336afc6189d339ee5c475d265c1852

And if I launch a service from a .service file and i don't have any cgroups specified i expect it to be inside the cgroups that systemd setup.

[Unit]
Description=test
After=network.target

[Service]
MemoryLimit=64M
CPUQuota=100%
ExecStart=/usr/local/bin/runc start test
WorkingDirectory=/containers/redis


● test.service - test
   Loaded: loaded (/lib/systemd/system/test.service; static; vendor preset: enabled)
   Active: active (running) since Fri 2016-06-10 14:43:28 PDT; 2s ago
 Main PID: 17401 (runc)
    Tasks: 8
   Memory: 2.5M (limit: 64.0M)
      CPU: 21ms
   CGroup: /system.slice/test.service
           ├─17401 /usr/local/bin/runc start test
           └─test
             └─17410 sleep 30

Jun 10 14:43:28 deathstar systemd[1]: Started test.
michael|/lib/systemd/system > cat /proc/17401/cgroup 
11:perf_event:/
10:cpuset:/
9:hugetlb:/
8:blkio:/system.slice/test.service
7:cpu,cpuacct:/system.slice/test.service
6:freezer:/
5:pids:/system.slice/test.service
4:devices:/system.slice/test.service
3:memory:/system.slice/test.service
2:net_cls,net_prio:/
1:name=systemd:/system.slice/test.service

@philips
Copy link

philips commented Jun 10, 2016

I don't understand what you mean by "it nests itself in its parents cgroup path". The problem I see is that it retrieves a random subcgroup from one cgroup, namely the devices subsystem, and takes that as the root cgroup for all other subsystems.

For context, cgroups won't have the concept of one process in different hierarchies overtime. There will be a unified hierarchy where all of the controllers exist for one group. Instead of how things are often done today where there is a hierarchy per controller.

I agree with @crosbymichael that this isn't a good default.

@cyphar
Copy link
Member

cyphar commented Jun 11, 2016

@philips Currently runtime-spec doesn't support unified hierarchies. It's a bit odd of a restriction, because you can have unified hierarchies with cgroupv1 as well (but runC is broken in that case too: #798). But the thing that @brauner is referring to is that if you do this (with cgroupsPath not set):

[host]% cat /proc/self/cgroup
11:freezer:/
10:devices:/system.slice/display-manager.service
9:memory:/system.slice/display-manager.service
8:hugetlb:/
7:net_cls,net_prio:/system.slice/display-manager.service
6:blkio:/system.slice/display-manager.service
5:pids:/system.slice/display-manager.service
4:cpuset:/
3:cpu,cpuacct:/system.slice/display-manager.service
2:perf_event:/
1:name=systemd:/system.slice/display-manager.service
[container]% cat /proc/self/cgroup
11:freezer:/system.slice/display-manager.service/test
10:devices:/system.slice/display-manager.service/test
9:memory:/system.slice/display-manager.service/test
8:hugetlb:/system.slice/display-manager.service/test
7:net_cls,net_prio:/system.slice/display-manager.service/test
6:blkio:/system.slice/display-manager.service/test
5:pids:/system.slice/display-manager.service/test
4:cpuset:/system.slice/display-manager.service/test
3:cpu,cpuacct:/system.slice/display-manager.service/test
2:perf_event:/system.slice/display-manager.service/test
1:name=systemd:/system.slice/display-manager.service/test

Which is a bug whether or not you believe that we should nest under parent cgroups (which is what @crosbymichael thinks is a good default). Currently we nest under the devices cgroup and then use that path for all other cgroups -- what we can do to fix this is to make an unspecified cgroupsPath be the same as setting "cgroupsPath": "<container name>" -- since runC currently treats relative paths like that.

The important thing to note is that the code won't have to change in a unified hierarchy -- it'll just be a special case where all of the parent paths are the same.

@brauner
Copy link
Contributor Author

brauner commented Jun 16, 2016

@crosbymichael:

Docker wants its own sub path and it sets it, there is no need to makeup a new default.

Sorry, maybe you misunderstood. I was just citing Docker as an example that uses a nice clean default for the cgroups it places itself in, namely /docker/hash-sum. All container runtimes seem to follow the schema /runtime-name/container-name as default cgroup path. runc however chooses an arbitrary prefix based on the devices subsystem.

And if I launch a service from a .service file and i don't have any cgroups specified i expect it to be inside the cgroups that systemd setup.

This behavior will not be altered by this commit.

@philips:

For context, cgroups won't have the concept of one process in different hierarchies overtime. There will be a unified hierarchy where all of the controllers exist for one group. Instead of how things are often done today where there is a hierarchy per controller.

As @cyphar said, this specific change does not affect cgroup v1 vs v2.

@hqhq
Copy link
Contributor

hqhq commented Aug 2, 2016

I agree there is issue with current approach, I think what @crosbymichael and @philips concerned is how we choose the default cgroup, a named runc cgroup under root would be random to some applications.

I would vote to remove the default unified cgroup path, and just create a cgroup named <container-id> in the cgroup of the parent process if user didn't assign cgroupPath, so the code would be:

diff --git a/libcontainer/specconv/spec_linux.go b/libcontainer/specconv/spec_linux.go
index 856371c..731ef2e 100644
--- a/libcontainer/specconv/spec_linux.go
+++ b/libcontainer/specconv/spec_linux.go
@@ -12,7 +12,6 @@ import (
        "syscall"
        "time"

-       "github.com/opencontainers/runc/libcontainer/cgroups"
        "github.com/opencontainers/runc/libcontainer/configs"
        "github.com/opencontainers/runc/libcontainer/seccomp"
        libcontainerUtils "github.com/opencontainers/runc/libcontainer/utils"
@@ -253,10 +252,7 @@ func createLibcontainerMount(cwd string, m specs.Mount) *configs.Mount {
 }

 func createCgroupConfig(name string, useSystemdCgroup bool, spec *specs.Spec) (*configs.Cgroup, error) {
-       var (
-               err          error
-               myCgroupPath string
-       )
+       var myCgroupPath string

        c := &configs.Cgroup{
                Resources: &configs.Resources{},
@@ -286,14 +282,8 @@ func createCgroupConfig(name string, useSystemdCgroup bool, spec *specs.Spec) (*
                        c.Name = parts[2]
                }
        } else {
-               if myCgroupPath == "" {
-                       myCgroupPath, err = cgroups.GetThisCgroupDir("devices")
-                       if err != nil {
-                               return nil, err
-                       }
-                       myCgroupPath = filepath.Join(myCgroupPath, name)
-               }
                c.Path = myCgroupPath
+               c.Name = name
        }

        c.Resources.AllowedDevices = allowedDevices

@opencontainers/runc-maintainers WDYT?

@brauner
Copy link
Contributor Author

brauner commented Aug 2, 2016

@hqhq, that's basically what I did. The only difference is that I'd append runc/container-name/.

@brauner
Copy link
Contributor Author

brauner commented Aug 2, 2016

Also I'd argue it's not random. It's the default approach for nearly all container runtimes. But I'm repeating myself. I'm fine with whatever you do as long as it is not as arbitrary as picking up the default cgroup prefix from the devices controller.

@hqhq
Copy link
Contributor

hqhq commented Aug 3, 2016

@brauner runc already have cgroupPath, any container engines using runc who want default father cgroups can use this option. runc doesn't need another default, instead, a better default would be in the child cgroup of the father process's cgroup, which would be depended by some applications like systemd, and runc is not able to do that yet. I agree your PR fixes issues and is improvement, I just proposed another approach which I think is better and we should do it right in one time.

@crosbymichael
Copy link
Member

@hqhq we already embed in the parent when no cgroupPath is specified. I think this is the best default and if you want more control you just set the cgroupPath. If you want something explicit then just set it, I don't want a hard coded default because it would be harder or require an extra flag to tell runc to embed it.

cat /proc/15161/cgroup 
11:freezer:/user.slice/test
10:perf_event:/user.slice/test
9:memory:/user.slice/test
8:hugetlb:/user.slice/test
7:net_cls,net_prio:/user.slice/test
6:blkio:/user.slice/test
5:pids:/user.slice/test
4:devices:/user.slice/test
3:cpu,cpuacct:/user.slice/test
2:cpuset:/user.slice/test
1:name=systemd:/user.slice/test

@brauner thanks for the PR but lets keep the default to embedding the cgroup in its parent and if you want to set something explicit, just add cgroupPath in the spec.

hqhq added a commit to hqhq/runc that referenced this pull request Aug 30, 2016
Alternative of opencontainers#895 , part of opencontainers#892

The intension of current behavior if to create cgroup in
parent cgroup of current process, but we did this in a
wrong way, we used devices cgroup path of current process
as the default parent path for all subsystems, this is
wrong because we don't always have the same cgroup path
for all subsystems.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
@hqhq hqhq mentioned this pull request Aug 30, 2016
@hqhq
Copy link
Contributor

hqhq commented Aug 30, 2016

@crosbymichael See #1009

hqhq added a commit to hqhq/runc that referenced this pull request Aug 30, 2016
Alternative of opencontainers#895 , part of opencontainers#892

The intension of current behavior if to create cgroup in
parent cgroup of current process, but we did this in a
wrong way, we used devices cgroup path of current process
as the default parent path for all subsystems, this is
wrong because we don't always have the same cgroup path
for all subsystems.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
hqhq added a commit to hqhq/runc that referenced this pull request Aug 30, 2016
Alternative of opencontainers#895 , part of opencontainers#892

The intension of current behavior if to create cgroup in
parent cgroup of current process, but we did this in a
wrong way, we used devices cgroup path of current process
as the default parent path for all subsystems, this is
wrong because we don't always have the same cgroup path
for all subsystems.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants