Support Lifecycle Hooks #20

vishh · 2015-06-25T17:56:13Z

Capturing requirements for Lifecycle hooks discussed during the summit.

Lifecycle hooks was agreed to be a useful feature and support for a few basic hooks is expected to be in place. Specifically, PreStart, PreStop and PostStop hooks will be introduced to start with. Supporting hooks might require separating the lifetime of namespaces from that of the processes in the container.

Hooks will be binaries that will be exec'ed. The Spec and current State of the container will be provided as arguments to the hook (via fds?). Are the exec hooks executed synchronously?

This issue is meant to define the Spec for Hooks and discuss the hooks that are necessary to begin with.

LK4D4 · 2015-06-25T18:06:51Z

I think makes sense to add PreCheckpoint, PostCheckpoint, PreRestore, PostRestore at least for linux extension. ping @boucher @avagin
I think hooks must be synchronous and receive state as json to stdout.

wking · 2015-06-25T18:18:38Z

On Thu, Jun 25, 2015 at 11:06:52AM -0700, Alexander Morozov wrote:

I think makes sense to add PreCheckpoint, PostCheckpoint,
PreRestore, PostRestore at least for linux extension…

Hmm, do we even have checkpoint/restore lifecycle actions 1? Maybe
that should be a separate issue before we talk about hooks for them
;).

boucher · 2015-06-25T18:26:08Z

It would definitely be useful to have pre/post checkpoint/restore.

philips · 2015-06-26T17:07:45Z

@wking @LK4D4 @boucher Lets concentrate on what we know we must do: prestart and poststop before defining hooks for other stuff first.

@vishh What is a prestop hook used for in your mind? We had a debate on pre-stop and just held off on defining one in appc: appc/spec#188 (comment)

mrunalp · 2015-06-26T17:46:31Z

I agree that prestart and poststop are what we call agree on as useful and can start with adding those.

mrunalp · 2015-06-29T18:48:00Z

Should also we define what arguments are passed to these hooks as well? On Linux, we will have to pass the pid of the container process.

LK4D4 · 2015-06-29T18:49:02Z

@mrunalp I thought we will pass whole state to all hooks.

mrunalp · 2015-06-29T18:51:05Z

@LK4D4 We could do that but wouldn't the state be implementation specific?

LK4D4 · 2015-06-29T18:58:36Z

@mrunalp But it still will be a string, right?

mrunalp · 2015-06-29T19:01:09Z

Yes, that is one way to make it work; let the hooks interpret whatever is passed down. If we do decide to pass the whole state then it might be better to send it over a pipe rather than as an argument.

LK4D4 · 2015-06-29T19:02:51Z

Totally agree on pipe(maybe stdin). Dunno how it will work in windows, probably the same.

tnachen · 2015-08-27T06:04:31Z

I think we also should draw out a timeline of the container life cycle i.e

namespace created -> pre-start -> user process launched -> process stopped -> post-stop -> cleanup

One reason is that I imagine there will be more hooks, since from a Mesos perspective we have more stages that requires more hooks for us to integrate, and it gets a bit more confusing as you need to read the description to figure out what each hook meant.

tnachen · 2015-08-27T06:05:10Z

Also can we name pre-start to something like pre-start-user-process and post-stop to post-stop-user-process to be more precise?

wking · 2015-08-27T13:08:05Z

On Wed, Aug 26, 2015 at 11:04:32PM -0700, Timothy Chen wrote:

I think we also should draw out a timeline of the container life cycle i.e

namespace created -> pre-start -> user process launched -> process stopped -> post-stop -> cleanup

+1

wking · 2015-08-27T13:09:15Z

On Wed, Aug 26, 2015 at 11:05:12PM -0700, Timothy Chen wrote:

Also can we name pre-start to something like pre-start-user-process
and post-stop to post-stop-user-process to be more precise?

With an explicit timeline (or graph, since it doesn't have to be
linear) early in the docs, I think the current names should be clear
enough (until we get more pre-start-something-else hooks).

conqerAtapple · 2015-08-27T16:17:48Z

We are essentially talking about an explicit container state machine. Once we have the state machine, it might be easier to give names and to add hooks.

mrunalp · 2015-08-27T17:09:36Z

@tnachen Yes, it makes sense to add such a timeline. I will make a PR for what we have right now in relation to the hooks. I think we probably don't want to rename these till there are concrete proposals for other pre/post hooks.

tnachen · 2015-08-27T17:11:35Z

Sounds good, I'll be creating more proposals as we're investigating how we can integrate Mesos with the runtime spec.

wking · 2015-08-29T07:39:39Z

On Thu, Aug 27, 2015 at 10:09:38AM -0700, Mrunal Patel wrote:

Yes, it makes sense to add such a timeline. I will make a PR for
what we have right now in relation to the hooks.

I didn't see a PR like that, and I wanted the timeline so I could link
from the lifecycle to the hook documentation as part of my
filename-consolidation (#126). So I added a more detailed version of
@tnachen's timeline as part of #126.

liangchenye · 2015-09-07T11:09:50Z

Should we define a priority In the 'pre-start' or 'post-stop'?
There might be multiply hooks, which should run first?

wking · 2015-09-07T17:49:13Z

On Mon, Sep 07, 2015 at 04:09:52AM -0700, 梁辰晔 (Liang Chenye) wrote:

Should we define a priority In the 'pre-start' or 'post-stop'?
There might be multiply hooks, which should run first?

The hook entries are arrays, so I think we should just execute them in
the listed order. More explicit wording in this regard would help
avoid confusion, although the current “remaining hooks” phrasing is a
start 1.

For a similar problem (explicitly linking mount order to the listed
array order), see #142.

liangchenye · 2015-09-08T01:46:52Z

Yes, similar.
I just 'copied' your word (listed order) and added a pull request.
#156

vbatts · 2015-09-09T17:10:26Z

We have basic hooks in now. This issue is largely a "tracker" for future hooks, like needed for CRIU.

wking · 2015-09-25T23:25:21Z

On Fri, Sep 25, 2015 at 04:00:59PM -0700, Mrunal Patel wrote:

@wking There was a discussion on IRC…

Ah, good :). Can you point me to the right day?

$ grep -ir post-start irclogs/2015/freenode/#opencontainers.*

isn't turning up anything…

wking · 2015-09-25T23:29:08Z

On Fri, Sep 25, 2015 at 04:10:32PM -0700, Jojy George Varghese wrote:

To complement the pre-start, we need a pre-stop hook (see diagram
for the symmetry).

The problem with pre-stop is that the stop is not necessarily being
triggered by the host-process managing the container. For example,
the process may just exit on its own. But as long as folks are clear
that these hooks only fire on ‘funC stop ’ (or whatever
triggers a host-initiated stop), I'm ok with them.

mrunalp · 2015-12-05T00:22:25Z

One more use case that we see while working with hooks is having pre/post mount hooks. These could be run before and after the new mount namespace is setup. WDYT?

wking · 2015-12-05T04:12:53Z

On Fri, Dec 04, 2015 at 04:22:27PM -0800, Mrunal Patel wrote:

One more use case that we see while working with hooks is having
pre/post mount hooks. These could be run before and after the new
mount namespace is setup. WDYT?

Is there a distinction between post-mount and pre-start? #231 doesn't
mention mounts at all, but I'd guess they'd fall under 7713efc's step
2. If so, the only difference between post-mount and pre-start would
be step 3, where the state.json gets written. That doesn't seem like
a useful enough distinction to me (but then, the distinction between
pre-start and post-start didn't seem useful enough to me either, #202,
1).

Pre-mount may be more useful, but if it's going to be a thing I think
we need to add enough detail to the lifecycle that we know what
pre-mount is coming after. And we need to figure out what is going to
go into its state JSON (do we require a container PID on Linux by that
point or not?).

Extend [1,2,3] to avoid: hook 1: spawn ---------------> reaped hook 2: spawn ----------------> reaped hook 3: spawn -----> reaped and explicitly require: hook 1: spawn --> reaped hook 2: spawn --> reaped hook 3: spawn --> reaped Folks who do want parallel execution are free to use a parallelizing wrapper: hook 1: spawn ---------------------------> reaped child 1 -----> reaped child 2 ---------> reaped child 3 ---> reaped Although that cuts both ways (with parallel hooks, folks could always use a single hook with a serializing wrapper). Still, I'd guess most current implementations are already taking the serialized approach, so it makes bundle-author life easier if we are explicit about that. [1]: opencontainers#20 (comment) [2]: opencontainers#156 [3]: opencontainers#157 Signed-off-by: W. Trevor King <wking@tremily.us>

rhatdan · 2016-01-17T14:07:06Z

I really need post-mount. I want a hook that is going to mount additional mount points to /sys. Currently if I use prestart, /sys gets mounted over my mount points.

wking · 2016-01-18T04:45:49Z

On Sun, Jan 17, 2016 at 06:07:08AM -0800, Daniel J Walsh wrote:

I really need post-mount. I want a hook that is going to mount
additional mount points to /sys. Currently if I use prestart, /sys
gets mounted over my mount points.

I think “more hooks” is the wrong solution. If you've provided /sys,
I don't think the runtime should be clobbering it (see points two and
three here 1). Of course, those points aren't currently spelled out
in the spec, which is pretty vague about the timing and clobbering
details of runtime-provided devices and mount points.

And I'd have expected pre-start to be run after the mount setup anyway
(it happens later in ccon 2).

But regardless of how this should be handled now, separate create/exec
actions (#299) should get you out of this problem, because /sys-level
mount tweaking should not be happening in the exec phase. That lets
you fix any broken start-phase functionality before calling exec. And
the only mount-tweaking done by exec will probably be mounting a new
/proc if the exec creates a new PID namespace 3. And the only thing
that will clobber (on my system anyway) is 4.

 See “/proc and PID namespaces”

rhatdan · 2016-01-18T12:58:50Z

If the start hooks happened after the mount then that would also fix my problem. I am trying to get additional kernel file systems mounted onto /sys. /sys/fs/cgroup to be exact. I don't want to take over the mounting of /sys or any other kernel file systems except for the cgroup hierarchy.

Currently /sys gets mounted after I mount /sys/fs/cgroup, so that my mounts disappear.

wking · 2016-03-18T00:16:23Z

@vishh suggested that I shift some discussion of hooks and a separate
create / (start or exec) over here [1,2]. See also #299 for more
background in splitting sandbox creation from process execution.
Linux runtimes launching a container process [3,4] will usually do an
IPC dance like 5 so the runtime process can spawn the pre-start
hooks after the container namespaces have been setup but before the
user-specified code is exec'ed (this may be an implemention detail
6, but I don't see how to avoid it 7). With the split proposed in
#299, the PID namespace will be created at start time (step (2) in
8), which means that any actions that need access to the container
PID (e.g. custom cgroup injection or state registration) will still
have to happen at start-time. So while splitting create from start
will make some things possible without hooks (e.g. manipulating the
mount namespace created by ‘create’), we won't be able to drop
pre-start hooks entirely 8.

vishh · 2016-03-18T17:40:08Z

(e.g. custom cgroup injection or state registration)

I don't get how running hooks between pid ns creation and user process startup is a necessity for either of these use cases. Can you elaborate more?

wking · 2016-03-18T17:50:11Z

On Fri, Mar 18, 2016 at 10:40:10AM -0700, Vish Kannan wrote:

(e.g. custom cgroup injection or state registration)

I don't get how running hooks between pid ns creation and user
process startup is a necessity for either of these use cases. Can
you elaborate more?

You can't add the container process to a cgroup until you have a PID,
and you can't get that PID (in the runtime namespace 1) until you
create the PID namespace.

More generally, if the ‘start’ call kicks off the process tree that
gives you the container process, it should be pretty clear that any
manipulation of that process has to happen after the ‘start’ call ;).
And things like cgroup injection and resource limits should be set
before you exec user-configured code, so that puts them after ‘start’
and before start's execvp (or whatever) in some sort of hook.

vishh · 2016-03-18T18:33:49Z

Why does one add a cgroup with a hook when the Spec already provides
ability to define cgroups?

On Fri, Mar 18, 2016 at 10:50 AM, W. Trevor King notifications@github.com
wrote:

On Fri, Mar 18, 2016 at 10:40:10AM -0700, Vish Kannan wrote:

(e.g. custom cgroup injection or state registration)

I don't get how running hooks between pid ns creation and user
process startup is a necessity for either of these use cases. Can
you elaborate more?

You can't add the container process to a cgroup until you have a PID,
and you can't get that PID (in the runtime namespace 1) until you
create the PID namespace.

More generally, if the ‘start’ call kicks off the process tree that
gives you the container process, it should be pretty clear that any
manipulation of that process has to happen after the ‘start’ call ;).
And things like cgroup injection and resource limits should be set
before you exec user-configured code, so that puts them after ‘start’
and before start's execvp (or whatever) in some sort of hook.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#20 (comment)

wking · 2016-03-18T18:45:09Z

On Fri, Mar 18, 2016 at 11:33:51AM -0700, Vish Kannan wrote:

Why does one add a cgroup with a hook when the Spec already provides
ability to define cgroups?

Maybe you want to use an alternative cgroup manager that's built on
higher level abstractions 1. Or maybe you want to use a new kernel
feature that hasn't landed in your runtime yet 2.

But is motivating the cgroups use case really that important? I find
it hard to imagine folks never needing access to the container PID
before user-specified code is exec'ed. The cgroups and
state-registration (e.g. if you want to maintain a cluster-wide
registry, but your runtime's builtin-registry doesn't handle that) are
just the most obvious cases I can think of, not an attempt at an
exhaustive list.

 Subject: Re: removal of cgroups from the OCI Linux spec
 Date: Thu, 29 Oct 2015 11:28:41 -0700
 Message-ID: <CAA_vbqSCEg+ByohgfR6Pqnzvy+j60=jmMNJi_UfnhB3WPtcjww@mail.gmail.com>

vishh · 2016-03-18T19:11:10Z

I'd prefer not considering such scenarios unless it cannot be solved
through other means.

On Fri, Mar 18, 2016 at 11:45 AM, W. Trevor King notifications@github.com
wrote:

On Fri, Mar 18, 2016 at 11:33:51AM -0700, Vish Kannan wrote:

Why does one add a cgroup with a hook when the Spec already provides
ability to define cgroups?

Maybe you want to use an alternative cgroup manager that's built on
higher level abstractions 1. Or maybe you want to use a new kernel
feature that hasn't landed in your runtime yet [2].

But is motivating the cgroups use case really that important? I find
it hard to imagine folks never needing access to the container PID
before user-specified code is exec'ed. The cgroups and
state-registration (e.g. if you want to maintain a cluster-wide
registry, but your runtime's builtin-registry doesn't handle that) are
just the most obvious cases I can think of, not an attempt at an
exhaustive list.

Subject: Re: removal of cgroups from the OCI Linux spec
Date: Thu, 29 Oct 2015 11:28:41 -0700
Message-ID: <CAA_vbqSCEg+ByohgfR6Pqnzvy+j60=
jmMNJi_UfnhB3WPtcjww@mail.gmail.com>
[2]: #267

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#20 (comment)

wking · 2016-03-18T19:33:23Z

On Fri, Mar 18, 2016 at 12:11:12PM -0700, Vish Kannan wrote:

I'd prefer not considering such scenarios unless it cannot be solved
through other means.

Ok, so we just need to get to the bottom of a particular use-case.
How do you see attachment to an experimental cgroup (e.g. #267)
working before support lands in your chosen runtime? You could
solve that by using v2 cgroups 1 and letting the runtime join one
you'd already setup, but that seems like a major change if you're
already using v1 cgroups. How does the runtime distinguish between v1
and v2 cgroup users anyway? Is the logic for joining cgroups:

If a v2 cgroup at cgroupsPath exists, add the container process to
it. Otherwise add the container process to any v1 cgroups whose
path relative to their cgroup mount point exists.

Or are runtimes supposed to be hard-coded to always use v1 or v2
cgroups? Or are v1 runtimes supposed to limit attachments to cgroups
that are defined under linux.resources?

vishh · 2016-03-18T19:35:16Z

Why not run a custom runtime or move the process after it gets spawned?

On Fri, Mar 18, 2016 at 12:33 PM, W. Trevor King notifications@github.com
wrote:

On Fri, Mar 18, 2016 at 12:11:12PM -0700, Vish Kannan wrote:

I'd prefer not considering such scenarios unless it cannot be solved
through other means.

Ok, so we just need to get to the bottom of a particular use-case.
How do you see attachment to an experimental cgroup (e.g. #267)
working before support lands in your chosen runtime? You could
solve that by using v2 cgroups 1 and letting the runtime join one
you'd already setup, but that seems like a major change if you're
already using v1 cgroups. How does the runtime distinguish between v1
and v2 cgroup users anyway? Is the logic for joining cgroups:

If a v2 cgroup at cgroupsPath exists, add the container process to
it. Otherwise add the container process to any v1 cgroups whose
path relative to their cgroup mount point exists.

Or are runtimes supposed to be hard-coded to always use v1 or v2
cgroups? Or are v1 runtimes supposed to limit attachments to cgroups
that are defined under linux.resources?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#20 (comment)

wking · 2016-03-18T19:49:08Z

On Fri, Mar 18, 2016 at 12:35:19PM -0700, Vish Kannan wrote:

Why not run a custom runtime…

Because that's a moderately large change when all you want is:

$ echo $PID >/sys/fs/cgroup/a/b/c/tasks

And “you could fork the runtime and add support for that” is an
argument that applies to everything. If we go that route, where do
you draw the line? It it only stuff that would need pre-start hooks
that doesn't get supported in the spec?

… or move the process after it gets spawned?

I am suggesting we move the process after it's spawned (because you
need a PID to assign it). I'm just suggesting we assign it to cgroups
before we execvp (or whatever) the user-specified code. Assigning
resource limits before exec'ing the user-specified code seems like the
only safe approach to me, since you want to avoid an unconstrained
time window at the beginning of the user-specified code's execution.

wking · 2017-04-04T22:44:31Z

I think this is closable. The current spec position is to split create/start (#384) and keep hooks (closing #483), so callers have two ways to handle any hook-like things they need to do. Skimming the tail of this discussion, I don't see anything that isn't currently supported. And if there are unsupported use-cases, I think a more productive approach would be to open per-case issues with “the current hooks or create/start split don't support …”, and we can hash out whether additional hooks are needed. For example, see #644, where pre-/post-mount hooks were proposed, discussed, and rejected.

vbatts · 2017-04-05T00:37:56Z

Agreed

mrunalp self-assigned this Jun 26, 2015

mrunalp mentioned this issue Jun 26, 2015

linux-config: clean-up namespaces docs #24

Closed

crosbymichael added this to the v0.1.0 milestone Jun 29, 2015

mrunalp mentioned this issue Jun 29, 2015

hooks: start with pre-start and post-stop hooks. #34

Merged

philips added the features/lifecycle label Jul 28, 2015

liangchenye mentioned this issue Sep 8, 2015

Add hooks order explanation in runtime.md #156

Closed

vbatts added this to the draft-next milestone Sep 11, 2015

LK4D4 mentioned this issue Sep 28, 2015

Add post-start hooks #202

Merged

crosbymichael modified the milestones: by-1.0.0, 1.0.0 May 25, 2016

wking mentioned this issue Jun 2, 2016

*: Remove hooks #483

Closed

crosbymichael modified the milestones: 1.0.0, 1.1.0 Jun 3, 2016

vbatts closed this as completed Apr 5, 2017

Support Lifecycle Hooks #20

Support Lifecycle Hooks #20

Comments

vishh commented Jun 25, 2015

LK4D4 commented Jun 25, 2015

wking commented Jun 25, 2015

boucher commented Jun 25, 2015

philips commented Jun 26, 2015

mrunalp commented Jun 26, 2015

mrunalp commented Jun 29, 2015

LK4D4 commented Jun 29, 2015

mrunalp commented Jun 29, 2015

LK4D4 commented Jun 29, 2015

mrunalp commented Jun 29, 2015

LK4D4 commented Jun 29, 2015

tnachen commented Aug 27, 2015

tnachen commented Aug 27, 2015

wking commented Aug 27, 2015

wking commented Aug 27, 2015

conqerAtapple commented Aug 27, 2015

mrunalp commented Aug 27, 2015

tnachen commented Aug 27, 2015

wking commented Aug 29, 2015

liangchenye commented Sep 7, 2015

wking commented Sep 7, 2015

liangchenye commented Sep 8, 2015

vbatts commented Sep 9, 2015

wking commented Sep 25, 2015

wking commented Sep 25, 2015

mrunalp commented Dec 5, 2015

wking commented Dec 5, 2015

rhatdan commented Jan 17, 2016

wking commented Jan 18, 2016

rhatdan commented Jan 18, 2016

wking commented Mar 18, 2016

vishh commented Mar 18, 2016

wking commented Mar 18, 2016

vishh commented Mar 18, 2016

wking commented Mar 18, 2016

vishh commented Mar 18, 2016

wking commented Mar 18, 2016

vishh commented Mar 18, 2016

wking commented Mar 18, 2016

wking commented Apr 4, 2017

vbatts commented Apr 5, 2017