Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Lifecycle Hooks #20

Closed
vishh opened this issue Jun 25, 2015 · 48 comments
Closed

Support Lifecycle Hooks #20

vishh opened this issue Jun 25, 2015 · 48 comments
Assignees
Milestone

Comments

@vishh
Copy link
Contributor

vishh commented Jun 25, 2015

Capturing requirements for Lifecycle hooks discussed during the summit.

Lifecycle hooks was agreed to be a useful feature and support for a few basic hooks is expected to be in place. Specifically, PreStart, PreStop and PostStop hooks will be introduced to start with. Supporting hooks might require separating the lifetime of namespaces from that of the processes in the container.

Hooks will be binaries that will be exec'ed. The Spec and current State of the container will be provided as arguments to the hook (via fds?). Are the exec hooks executed synchronously?

This issue is meant to define the Spec for Hooks and discuss the hooks that are necessary to begin with.

@LK4D4
Copy link
Contributor

LK4D4 commented Jun 25, 2015

I think makes sense to add PreCheckpoint, PostCheckpoint, PreRestore, PostRestore at least for linux extension. ping @boucher @avagin
I think hooks must be synchronous and receive state as json to stdout.

@wking
Copy link
Contributor

wking commented Jun 25, 2015

On Thu, Jun 25, 2015 at 11:06:52AM -0700, Alexander Morozov wrote:

I think makes sense to add PreCheckpoint, PostCheckpoint,
PreRestore, PostRestore at least for linux extension…

Hmm, do we even have checkpoint/restore lifecycle actions 1? Maybe
that should be a separate issue before we talk about hooks for them
;).

@boucher
Copy link

boucher commented Jun 25, 2015

It would definitely be useful to have pre/post checkpoint/restore.

@philips
Copy link
Contributor

philips commented Jun 26, 2015

@wking @LK4D4 @boucher Lets concentrate on what we know we must do: prestart and poststop before defining hooks for other stuff first.

@vishh What is a prestop hook used for in your mind? We had a debate on pre-stop and just held off on defining one in appc: appc/spec#188 (comment)

@mrunalp
Copy link
Contributor

mrunalp commented Jun 26, 2015

I agree that prestart and poststop are what we call agree on as useful and can start with adding those.

@mrunalp
Copy link
Contributor

mrunalp commented Jun 29, 2015

Should also we define what arguments are passed to these hooks as well? On Linux, we will have to pass the pid of the container process.

@LK4D4
Copy link
Contributor

LK4D4 commented Jun 29, 2015

@mrunalp I thought we will pass whole state to all hooks.

@mrunalp
Copy link
Contributor

mrunalp commented Jun 29, 2015

@LK4D4 We could do that but wouldn't the state be implementation specific?

@LK4D4
Copy link
Contributor

LK4D4 commented Jun 29, 2015

@mrunalp But it still will be a string, right?

@mrunalp
Copy link
Contributor

mrunalp commented Jun 29, 2015

Yes, that is one way to make it work; let the hooks interpret whatever is passed down. If we do decide to pass the whole state then it might be better to send it over a pipe rather than as an argument.

@LK4D4
Copy link
Contributor

LK4D4 commented Jun 29, 2015

Totally agree on pipe(maybe stdin). Dunno how it will work in windows, probably the same.

@tnachen
Copy link

tnachen commented Aug 27, 2015

I think we also should draw out a timeline of the container life cycle i.e

namespace created -> pre-start -> user process launched -> process stopped -> post-stop -> cleanup

One reason is that I imagine there will be more hooks, since from a Mesos perspective we have more stages that requires more hooks for us to integrate, and it gets a bit more confusing as you need to read the description to figure out what each hook meant.

@tnachen
Copy link

tnachen commented Aug 27, 2015

Also can we name pre-start to something like pre-start-user-process and post-stop to post-stop-user-process to be more precise?

@wking
Copy link
Contributor

wking commented Aug 27, 2015

On Wed, Aug 26, 2015 at 11:04:32PM -0700, Timothy Chen wrote:

I think we also should draw out a timeline of the container life cycle i.e

namespace created -> pre-start -> user process launched -> process stopped -> post-stop -> cleanup

+1

@wking
Copy link
Contributor

wking commented Aug 27, 2015

On Wed, Aug 26, 2015 at 11:05:12PM -0700, Timothy Chen wrote:

Also can we name pre-start to something like pre-start-user-process
and post-stop to post-stop-user-process to be more precise?

With an explicit timeline (or graph, since it doesn't have to be
linear) early in the docs, I think the current names should be clear
enough (until we get more pre-start-something-else hooks).

@conqerAtapple
Copy link

We are essentially talking about an explicit container state machine. Once we have the state machine, it might be easier to give names and to add hooks.

@mrunalp
Copy link
Contributor

mrunalp commented Aug 27, 2015

@tnachen Yes, it makes sense to add such a timeline. I will make a PR for what we have right now in relation to the hooks. I think we probably don't want to rename these till there are concrete proposals for other pre/post hooks.

@tnachen
Copy link

tnachen commented Aug 27, 2015

Sounds good, I'll be creating more proposals as we're investigating how we can integrate Mesos with the runtime spec.

@wking
Copy link
Contributor

wking commented Aug 29, 2015

On Thu, Aug 27, 2015 at 10:09:38AM -0700, Mrunal Patel wrote:

Yes, it makes sense to add such a timeline. I will make a PR for
what we have right now in relation to the hooks.

I didn't see a PR like that, and I wanted the timeline so I could link
from the lifecycle to the hook documentation as part of my
filename-consolidation (#126). So I added a more detailed version of
@tnachen's timeline as part of #126.

@liangchenye
Copy link
Member

Should we define a priority In the 'pre-start' or 'post-stop'?
There might be multiply hooks, which should run first?

@wking
Copy link
Contributor

wking commented Sep 7, 2015

On Mon, Sep 07, 2015 at 04:09:52AM -0700, 梁辰晔 (Liang Chenye) wrote:

Should we define a priority In the 'pre-start' or 'post-stop'?
There might be multiply hooks, which should run first?

The hook entries are arrays, so I think we should just execute them in
the listed order. More explicit wording in this regard would help
avoid confusion, although the current “remaining hooks” phrasing is a
start 1.

For a similar problem (explicitly linking mount order to the listed
array order), see #142.

@liangchenye
Copy link
Member

Yes, similar.
I just 'copied' your word (listed order) and added a pull request.
#156

@vbatts
Copy link
Member

vbatts commented Sep 9, 2015

We have basic hooks in now. This issue is largely a "tracker" for future hooks, like needed for CRIU.

@vbatts vbatts added this to the draft-next milestone Sep 11, 2015
@wking
Copy link
Contributor

wking commented Sep 25, 2015

On Fri, Sep 25, 2015 at 04:00:59PM -0700, Mrunal Patel wrote:

@wking There was a discussion on IRC…

Ah, good :). Can you point me to the right day?

$ grep -ir post-start irclogs/2015/freenode/#opencontainers.*

isn't turning up anything…

@wking
Copy link
Contributor

wking commented Sep 25, 2015

On Fri, Sep 25, 2015 at 04:10:32PM -0700, Jojy George Varghese wrote:

To complement the pre-start, we need a pre-stop hook (see diagram
for the symmetry).

The problem with pre-stop is that the stop is not necessarily being
triggered by the host-process managing the container. For example,
the process may just exit on its own. But as long as folks are clear
that these hooks only fire on ‘funC stop ’ (or whatever
triggers a host-initiated stop), I'm ok with them.

@mrunalp
Copy link
Contributor

mrunalp commented Dec 5, 2015

One more use case that we see while working with hooks is having pre/post mount hooks. These could be run before and after the new mount namespace is setup. WDYT?

@wking
Copy link
Contributor

wking commented Dec 5, 2015

On Fri, Dec 04, 2015 at 04:22:27PM -0800, Mrunal Patel wrote:

One more use case that we see while working with hooks is having
pre/post mount hooks. These could be run before and after the new
mount namespace is setup. WDYT?

Is there a distinction between post-mount and pre-start? #231 doesn't
mention mounts at all, but I'd guess they'd fall under 7713efc's step
2. If so, the only difference between post-mount and pre-start would
be step 3, where the state.json gets written. That doesn't seem like
a useful enough distinction to me (but then, the distinction between
pre-start and post-start didn't seem useful enough to me either, #202,
1).

Pre-mount may be more useful, but if it's going to be a thing I think
we need to add enough detail to the lifecycle that we know what
pre-mount is coming after. And we need to figure out what is going to
go into its state JSON (do we require a container PID on Linux by that
point or not?).

wking added a commit to wking/opencontainer-runtime-spec that referenced this issue Dec 11, 2015
Extend [1,2,3] to avoid:

  hook 1: spawn ---------------> reaped
  hook 2:       spawn ----------------> reaped
  hook 3:             spawn -----> reaped

and explicitly require:

  hook 1: spawn --> reaped
  hook 2:                  spawn --> reaped
  hook 3:                                   spawn --> reaped

Folks who do want parallel execution are free to use a parallelizing
wrapper:

  hook 1: spawn ---------------------------> reaped
                child 1 -----> reaped
                  child 2 ---------> reaped
                    child 3 ---> reaped

Although that cuts both ways (with parallel hooks, folks could always
use a single hook with a serializing wrapper).  Still, I'd guess most
current implementations are already taking the serialized approach, so
it makes bundle-author life easier if we are explicit about that.

[1]: opencontainers#20 (comment)
[2]: opencontainers#156
[3]: opencontainers#157

Signed-off-by: W. Trevor King <wking@tremily.us>
@rhatdan
Copy link
Contributor

rhatdan commented Jan 17, 2016

I really need post-mount. I want a hook that is going to mount additional mount points to /sys. Currently if I use prestart, /sys gets mounted over my mount points.

@wking
Copy link
Contributor

wking commented Jan 18, 2016

On Sun, Jan 17, 2016 at 06:07:08AM -0800, Daniel J Walsh wrote:

I really need post-mount. I want a hook that is going to mount
additional mount points to /sys. Currently if I use prestart, /sys
gets mounted over my mount points.

I think “more hooks” is the wrong solution. If you've provided /sys,
I don't think the runtime should be clobbering it (see points two and
three here 1). Of course, those points aren't currently spelled out
in the spec, which is pretty vague about the timing and clobbering
details of runtime-provided devices and mount points.

And I'd have expected pre-start to be run after the mount setup anyway
(it happens later in ccon 2).

But regardless of how this should be handled now, separate create/exec
actions (#299) should get you out of this problem, because /sys-level
mount tweaking should not be happening in the exec phase. That lets
you fix any broken start-phase functionality before calling exec. And
the only mount-tweaking done by exec will probably be mounting a new
/proc if the exec creates a new PID namespace 3. And the only thing
that will clobber (on my system anyway) is 4.

 See “/proc and PID namespaces”

@rhatdan
Copy link
Contributor

rhatdan commented Jan 18, 2016

If the start hooks happened after the mount then that would also fix my problem. I am trying to get additional kernel file systems mounted onto /sys. /sys/fs/cgroup to be exact. I don't want to take over the mounting of /sys or any other kernel file systems except for the cgroup hierarchy.

Currently /sys gets mounted after I mount /sys/fs/cgroup, so that my mounts disappear.

@wking
Copy link
Contributor

wking commented Mar 18, 2016

@vishh suggested that I shift some discussion of hooks and a separate
create / (start or exec) over here [1,2]. See also #299 for more
background in splitting sandbox creation from process execution.
Linux runtimes launching a container process [3,4] will usually do an
IPC dance like 5 so the runtime process can spawn the pre-start
hooks after the container namespaces have been setup but before the
user-specified code is exec'ed (this may be an implemention detail
6, but I don't see how to avoid it 7). With the split proposed in
#299, the PID namespace will be created at start time (step (2) in
8), which means that any actions that need access to the container
PID (e.g. custom cgroup injection or state registration) will still
have to happen at start-time. So while splitting create from start
will make some things possible without hooks (e.g. manipulating the
mount namespace created by ‘create’), we won't be able to drop
pre-start hooks entirely 8.

@vishh
Copy link
Contributor Author

vishh commented Mar 18, 2016

(e.g. custom cgroup injection or state registration)

I don't get how running hooks between pid ns creation and user process startup is a necessity for either of these use cases. Can you elaborate more?

@wking
Copy link
Contributor

wking commented Mar 18, 2016

On Fri, Mar 18, 2016 at 10:40:10AM -0700, Vish Kannan wrote:

(e.g. custom cgroup injection or state registration)

I don't get how running hooks between pid ns creation and user
process startup is a necessity for either of these use cases. Can
you elaborate more?

You can't add the container process to a cgroup until you have a PID,
and you can't get that PID (in the runtime namespace 1) until you
create the PID namespace.

More generally, if the ‘start’ call kicks off the process tree that
gives you the container process, it should be pretty clear that any
manipulation of that process has to happen after the ‘start’ call ;).
And things like cgroup injection and resource limits should be set
before you exec user-configured code, so that puts them after ‘start’
and before start's execvp (or whatever) in some sort of hook.

@vishh
Copy link
Contributor Author

vishh commented Mar 18, 2016

Why does one add a cgroup with a hook when the Spec already provides
ability to define cgroups?

On Fri, Mar 18, 2016 at 10:50 AM, W. Trevor King notifications@github.com
wrote:

On Fri, Mar 18, 2016 at 10:40:10AM -0700, Vish Kannan wrote:

(e.g. custom cgroup injection or state registration)

I don't get how running hooks between pid ns creation and user
process startup is a necessity for either of these use cases. Can
you elaborate more?

You can't add the container process to a cgroup until you have a PID,
and you can't get that PID (in the runtime namespace 1) until you
create the PID namespace.

More generally, if the ‘start’ call kicks off the process tree that
gives you the container process, it should be pretty clear that any
manipulation of that process has to happen after the ‘start’ call ;).
And things like cgroup injection and resource limits should be set
before you exec user-configured code, so that puts them after ‘start’
and before start's execvp (or whatever) in some sort of hook.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#20 (comment)

@wking
Copy link
Contributor

wking commented Mar 18, 2016

On Fri, Mar 18, 2016 at 11:33:51AM -0700, Vish Kannan wrote:

Why does one add a cgroup with a hook when the Spec already provides
ability to define cgroups?

Maybe you want to use an alternative cgroup manager that's built on
higher level abstractions 1. Or maybe you want to use a new kernel
feature that hasn't landed in your runtime yet 2.

But is motivating the cgroups use case really that important? I find
it hard to imagine folks never needing access to the container PID
before user-specified code is exec'ed. The cgroups and
state-registration (e.g. if you want to maintain a cluster-wide
registry, but your runtime's builtin-registry doesn't handle that) are
just the most obvious cases I can think of, not an attempt at an
exhaustive list.

 Subject: Re: removal of cgroups from the OCI Linux spec
 Date: Thu, 29 Oct 2015 11:28:41 -0700
 Message-ID: <CAA_vbqSCEg+ByohgfR6Pqnzvy+j60=jmMNJi_UfnhB3WPtcjww@mail.gmail.com>

@vishh
Copy link
Contributor Author

vishh commented Mar 18, 2016

I'd prefer not considering such scenarios unless it cannot be solved
through other means.

On Fri, Mar 18, 2016 at 11:45 AM, W. Trevor King notifications@github.com
wrote:

On Fri, Mar 18, 2016 at 11:33:51AM -0700, Vish Kannan wrote:

Why does one add a cgroup with a hook when the Spec already provides
ability to define cgroups?

Maybe you want to use an alternative cgroup manager that's built on
higher level abstractions 1. Or maybe you want to use a new kernel
feature that hasn't landed in your runtime yet [2].

But is motivating the cgroups use case really that important? I find
it hard to imagine folks never needing access to the container PID
before user-specified code is exec'ed. The cgroups and
state-registration (e.g. if you want to maintain a cluster-wide
registry, but your runtime's builtin-registry doesn't handle that) are
just the most obvious cases I can think of, not an attempt at an
exhaustive list.

Subject: Re: removal of cgroups from the OCI Linux spec
Date: Thu, 29 Oct 2015 11:28:41 -0700
Message-ID: <CAA_vbqSCEg+ByohgfR6Pqnzvy+j60=
jmMNJi_UfnhB3WPtcjww@mail.gmail.com>
[2]: #267


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#20 (comment)

@wking
Copy link
Contributor

wking commented Mar 18, 2016

On Fri, Mar 18, 2016 at 12:11:12PM -0700, Vish Kannan wrote:

I'd prefer not considering such scenarios unless it cannot be solved
through other means.

Ok, so we just need to get to the bottom of a particular use-case.
How do you see attachment to an experimental cgroup (e.g. #267)
working before support lands in your chosen runtime? You could
solve that by using v2 cgroups 1 and letting the runtime join one
you'd already setup, but that seems like a major change if you're
already using v1 cgroups. How does the runtime distinguish between v1
and v2 cgroup users anyway? Is the logic for joining cgroups:

If a v2 cgroup at cgroupsPath exists, add the container process to
it. Otherwise add the container process to any v1 cgroups whose
path relative to their cgroup mount point exists.

Or are runtimes supposed to be hard-coded to always use v1 or v2
cgroups? Or are v1 runtimes supposed to limit attachments to cgroups
that are defined under linux.resources?

@vishh
Copy link
Contributor Author

vishh commented Mar 18, 2016

Why not run a custom runtime or move the process after it gets spawned?

On Fri, Mar 18, 2016 at 12:33 PM, W. Trevor King notifications@github.com
wrote:

On Fri, Mar 18, 2016 at 12:11:12PM -0700, Vish Kannan wrote:

I'd prefer not considering such scenarios unless it cannot be solved
through other means.

Ok, so we just need to get to the bottom of a particular use-case.
How do you see attachment to an experimental cgroup (e.g. #267)
working before support lands in your chosen runtime? You could
solve that by using v2 cgroups 1 and letting the runtime join one
you'd already setup, but that seems like a major change if you're
already using v1 cgroups. How does the runtime distinguish between v1
and v2 cgroup users anyway? Is the logic for joining cgroups:

If a v2 cgroup at cgroupsPath exists, add the container process to
it. Otherwise add the container process to any v1 cgroups whose
path relative to their cgroup mount point exists.

Or are runtimes supposed to be hard-coded to always use v1 or v2
cgroups? Or are v1 runtimes supposed to limit attachments to cgroups
that are defined under linux.resources?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#20 (comment)

@wking
Copy link
Contributor

wking commented Mar 18, 2016

On Fri, Mar 18, 2016 at 12:35:19PM -0700, Vish Kannan wrote:

Why not run a custom runtime…

Because that's a moderately large change when all you want is:

$ echo $PID >/sys/fs/cgroup/a/b/c/tasks

And “you could fork the runtime and add support for that” is an
argument that applies to everything. If we go that route, where do
you draw the line? It it only stuff that would need pre-start hooks
that doesn't get supported in the spec?

… or move the process after it gets spawned?

I am suggesting we move the process after it's spawned (because you
need a PID to assign it). I'm just suggesting we assign it to cgroups
before we execvp (or whatever) the user-specified code. Assigning
resource limits before exec'ing the user-specified code seems like the
only safe approach to me, since you want to avoid an unconstrained
time window at the beginning of the user-specified code's execution.

@crosbymichael crosbymichael modified the milestones: by-1.0.0, 1.0.0 May 25, 2016
@wking wking mentioned this issue Jun 2, 2016
@crosbymichael crosbymichael modified the milestones: 1.0.0, 1.1.0 Jun 3, 2016
@wking
Copy link
Contributor

wking commented Apr 4, 2017

I think this is closable. The current spec position is to split create/start (#384) and keep hooks (closing #483), so callers have two ways to handle any hook-like things they need to do. Skimming the tail of this discussion, I don't see anything that isn't currently supported. And if there are unsupported use-cases, I think a more productive approach would be to open per-case issues with “the current hooks or create/start split don't support …”, and we can hash out whether additional hooks are needed. For example, see #644, where pre-/post-mount hooks were proposed, discussed, and rejected.

@vbatts
Copy link
Member

vbatts commented Apr 5, 2017

Agreed

@vbatts vbatts closed this as completed Apr 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests