Skip to content
This repository has been archived by the owner on Sep 18, 2020. It is now read-only.

update-operator: Require before and after reboot annotations #121

Merged
merged 7 commits into from
Aug 7, 2017

Conversation

sdemos
Copy link
Contributor

@sdemos sdemos commented Jul 24, 2017

This pr adds pre- and post-reboot actions.

With these changes, you will be able to configure actions which get run before a reboot is initiated and after one is completed, effectively gating it on either side. The actions taken are entirely arbitrary, and software that implements them only needs to be able to manipulate kubernetes annotations and read labels.

The changes to the update-operator ended up being a little more of an overhaul than I originally intended. With the addition of the new behavior, the update-operator has, broadly speaking, four state changes which it is responsible for. Those changes are

  1. Find nodes where the post-reboot actions are completed successfully mark everything ready to go
  2. Find nodes which have just rebooted and initiate post-reboot actions on them
  3. Find nodes where the pre-reboot actions are completed successfully and tell them to reboot
  4. Find nodes which want to reboot and initiate pre-reboot actions on them

I broke the process function out into four functions, one for each of those state changes, to make it easier to understand, validate, and modify in the future.

The update-agent has been modified not to mark a node schedulable until the post-reboot actions are complete. Currently, the node continues to be marked as scheduled until it starts draining, which is after the pre-reboot actions are run. It is possible to modify it to mark it as unschedulable while running pre-reboot actions, but it would require more work and I don't think it's necessary to do that. If someone can think of a good reason to mark it unschedulable please disagree with me.

I will include documentation around the usage in a separate pr, as well as examples.

@sdemos sdemos requested review from lucab, euank and dghubble July 24, 2017 23:39
err = k8sutil.UpdateNodeRetry(k.nc, n.Name, func(node *v1api.Node) {
if !wantsRebootSelector.Matches(fields.Set(node.Annotations)) {
glog.Warningf("Node %v no longer wanted to reboot while we were trying to mark it so: %v", node.Name, node.Annotations)
return
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return be a continue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is part of an anonymous function that starts on line 317. It's a little weird, but this node annotation update needs to make sure the node still wants to reboot before it gets marked to do so.

lucab
lucab previously requested changes Jul 25, 2017
const (
// ActionsSucceeded is returned when all actions have set their configured
// annotation to true
ActionsSucceeded ActionsResult = iota
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest using something which is not the success status code as the default value for this custom type, otherwise any dumb programming bug may end up with false-positive results:

var a ActionsResult
fmt.Printf("%t", a == ActionsSucceeded)

What about swapping the order of the two values or introducing an ActionUnknownStatus as the default value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to add an ActionsUnknown value. That way the zero value is not something which is used to represent an actual state, and instead represents the absence of that state. Right now it does the same thing, since everywhere that uses this value just checks result == ActionsSucceeded

// true - the action has completed successfully
// anything else - the action has not yet completed
// if all of the annotations exist and are true, return ActionsSucceeded
// if any action dne or is not set to true, return ActionsPending
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please spell out "dne".

@@ -103,6 +97,12 @@ func (k *Klocksmith) process(stop <-chan struct{}) error {
return err
}

// we are schedulable now.
glog.Info("Marking node as schedulable")
if err := k8sutil.Unschedulable(k.nc, k.node, false); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to #122, do (and should) we have some way to ensure we are only marking schedulable nodes that we previously marked as unschedulable (i.e. we are not overriding some unrelated user marking)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an interesting question. It seems like we just indiscriminately mark nodes schedulable and not schedulable for our own uses right now, and that is something that is worth thinking about. I do think it's out of scope for this particular change though.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derp, sorry. I actually did forget my glasses at home today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sdemos ack, let's move this to its own ticket/PR. Perhaps there is aready one, I didn't check (my bad). Can you take care of that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than #122 I haven't seen any other reference to this specific issue, although we have some other issues with being good neighbors (#50, #61).

@lucab
Copy link
Contributor

lucab commented Jul 25, 2017

I've left some initial comments inline but I'll do a deeper pass at a later point. Regarding your two WIP points: "yes please" to both, and we use cobra externsively in all other projects.

@squeed
Copy link

squeed commented Jul 25, 2017

Here is a specific use-case for this, for your own inspiration:

When CL is upgraded on a node, the torcx portion of the agent will need to make sure the appropriate docker addon is fetched. This has the following requirements:

  • Executed after or during the CL update, before the reboot
  • A fetch failure should prevent the reboot and (ideally) prevent the new version from being selected, should a reboot accidentally happen
  • After reboot, a GC process should be run.

It would be helpful if the new CL version was an argument, but it's not absolutely necessary

@sdemos
Copy link
Contributor Author

sdemos commented Jul 26, 2017

Found out that flagutil in coreos/pkg has a slice argument flags extension already in it. I'm just using that instead of moving to a new library for now.

@sdemos sdemos force-pushed the kube-update-hooks branch 2 times, most recently from ce759ea to 84226c7 Compare July 28, 2017 18:50
Copy link
Member

@dghubble dghubble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an initial pass, I want us to improve the nomenclature here.

@@ -15,6 +15,8 @@ import (
)

var (
preRebootActions flagutil.StringSliceFlag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are annotations, not "actions" (let's avoid this term in the real implementation, it doesn't mean much)

@@ -25,12 +27,16 @@ var (
)

func main() {
flag.Var(&preRebootActions, "require-pre-reboot", "list of comma-separated kubernetes annotations required to be set to 'true' before the operator will authorize a reboot")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop mention of authorization.

before-reboot-annotations - "List of comma separated Kubernetes node annotations that must be set to true before a reboot is allowed."

@@ -25,12 +27,16 @@ var (
)

func main() {
flag.Var(&preRebootActions, "require-pre-reboot", "list of comma-separated kubernetes annotations required to be set to 'true' before the operator will authorize a reboot")
flag.Var(&postRebootActions, "require-post-reboot", "list of comma-separated kubernetes annotations required to be set to 'true' before the operator will consider a reboot successful")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after-reboot-annotations - "List of comma separated Kubernetes node annotations that must be set to true before a node is marked scheduleable and the operator lock is released."

@@ -0,0 +1,29 @@
apiVersion: extensions/v1beta1
Copy link
Member

@dghubble dghubble Jul 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples in a separate PR which has a tutorial and docs?

}

// New creates a new action with the given annotation
func New(annotation string) *Action {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we were using Action as a placeholder phrase in design docs, not as the actual implementation name. Its rather vague. Especially because CLUO is not actuating the actions, its simply waiting for annotation requirements to be fulfilled.

@@ -51,6 +51,14 @@ const (
// It is an opaque string, but might be semver.
AnnotationNewVersion = Prefix + "new-version"

// Key set by the update-operator when pre- and post-reboot actions should run
LabelRebootAction = Prefix + "reboot-action"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I didn't think we were actually exposing "action" in the label names. Why not use something the user would expect like:

nodeSelector:
    container-linux-update.v1.coreos.com/before-reboot: "true"

or

nodeSelector:
    container-linux-update.v1.coreos.com/after-reboot: "true"

return fmt.Errorf("Failed listing nodes %v", err)
}

preRebootReq, err := labels.NewRequirement(constants.LabelRebootAction, selection.In, []string{constants.RebootActionPre})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a little helper to construct Requirements at compilation time

// take some number of the rebootable nodes.
// remove any old action annotations and add the reboot-action=pre label.
// this starts the pre-reboot actions.
err = k.markPreRebootActionable()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how method names like these developed, but looking at it with fresh eyes, it seems nonsensical. If we went with the nomenclature I mentioned, this becomes markBeforeReboot() which adds a before-reboot=true annotation.

}

// Verify no nodes are still in the process of rebooting to avoid rebooting N > maxRebootingNodes
postRebootReq, err := labels.NewRequirement(constants.LabelRebootAction, selection.In, []string{constants.RebootActionPost})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all parts of the requirement are constants or known upfront, checkout MustRequirement and a var declaration.

// find nodes which just rebooted
justRebootedNodes := k8sutil.FilterNodesByAnnotation(nodelist.Items, justRebootedSelector)
// also filter out any nodes that are already labeled with reboot-action=post
notPostRebootReq, err := labels.NewRequirement(constants.LabelRebootAction, selection.NotIn, []string{constants.RebootActionPost})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

// anything else - the action has not yet completed
// if all of the annotations exist and are true, return ActionsSucceeded
// if any action does not exist or is not set to true, return ActionsPending
func ActionsCompleted(node v1api.Node, actions []*Action) ActionsResult {
Copy link
Member

@dghubble dghubble Jul 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In offline conversation we talked about removing this package, removing actions, and just having something kinda like HasAnnotations(node, annotations []Annotations) bool or maybe just with strings

@sdemos sdemos force-pushed the kube-update-hooks branch 2 times, most recently from c44afc9 to 54f155b Compare August 1, 2017 17:48
@sdemos sdemos changed the title [WIP] update-operator: added pre- and post-reboot actions update-operator: added pre- and post-reboot actions Aug 1, 2017
@sdemos
Copy link
Contributor Author

sdemos commented Aug 1, 2017

Removed WIP. I'm going to include docs changes in another pr once we are comfortable with these changes, so that I can write them based on the final form. Please review @dghubble @lucab @euank @squeed

v1api "k8s.io/client-go/pkg/api/v1"
)

// NewRequirementOrDie wraps a call to NewRequirement and panics if the Requirment
// cannot be created. It is intended for use in variable initializations only.
func NewRequirementOrDie(key string, op selection.Operator, vals []string) *labels.Requirement {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the standard library, they name these functions MustThing like MustTemplate, MustParse, etc. They also wrap the return (Object, error), not the constructor as constructor funcs can change over time.

I prefer you keep the previous MustRequirement function which had this in mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a method in the field kubernetes package called ParseSelectorOrDie that I was emulating with this call, since we use that one to construct selectors as well.

As for the explicit arguments vs the return, I felt like it was cleaner to just have one function call vs two, and the type system will trivially catch any issues with modifications to the underlying constructor that we would have to care about in this function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

return nil
}

func hasAnnotations(node v1api.Node, annotations []string) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add method comment about this being an AND or maybe call it hasAllAnnotations

// annotations are set. if all annotations are set to true then remove the
// before-reboot=true label and set reboot=ok=true, telling the agent it's
// time to reboot.
err = k.checkBeforeReboot()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the comment down to be the formal method comment. In the process, let's more clearly define the error - the current handling seems unexpected

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed. Its clear that these 4 state transition functions can run in different orders and the current ordering is coincidental and errors are divided btw serious errors worth waiting for the process loop and those that are more benign.

I'd favor these methods returning any error that occurs though. Its easier to explain and to write tests if the errors are returned rather than being log lines. Waiting for the process loop shouldn't be too common since the places we can error now already have reties baked in and if we switch to Watches in the future, the fixed period goes away anyway. If we instead plan for them to be 4 independent loops, it'd be ok to spawn 4 goroutines for them, that way nobody starts depending on their ordering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of putting them in goroutines to enforce the fact that the ordering doesn't matter, but I feel like that is asking for weird race condition bugs. The ordering of the state checking and modification doesn't matter, but it would take a lot more work to make us confident that the functions are safe to run concurrently.

I agree with the error returning though. I'll change that right now.

@sdemos
Copy link
Contributor Author

sdemos commented Aug 2, 2017

A few changes pushed just now - all functions called by the process function return an error if there is an error with the subcalls, there are doc comments for each of the four main state change functions, and I consolidated the kubernetes node label and annotation updates into a single call for each state changes instead of two update calls in a row, to prevent possible strange half-states if something fails in the middle of being updated (it's also a little faster and imo cleaner looking).

delete(node.Labels, constants.LabelBeforeReboot)

// make sure that the node still wants to reboot
if !wantsRebootSelector.Matches(fields.Set(node.Annotations)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we just want to handle this during ordinary reconciliation. If we ever see a node that has reboot-ok=true, but no longer wants a reboot, we "unpick" it - seems like this might happen independent of this block of code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd also want to cleanup before-reboot annotations that may have been applied. Putting before-reboot and after-boot labels on a node kinda starts the window of time where custom pods add annotations - seems like we should promise to run a cleanup method at the end of that window (i.e. when removing the before-reboot and after-reboot labels).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage of checking if it no longer wants to reboot here is that it checks at the last possible second, so there isn't a possibility of race conditions. We can also check it in the main loop, but we should still continue to check it here I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as cleaning up before-reboot and after-reboot annotations, it's currently doing it right before it applies the respective label to the node, to ensure an entirely clean environment, and that there aren't any old annotations lying around from the last run. It seems to me like if we only clean them up when we remove the label after completion, there is a possibility that a stray annotation might make it into the mix. Maybe I'm being paranoid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main worry is that if this inconsistency can occur here, it can occur at other times too so we should patch it up in general reconciliation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't a possibility of race conditions

I mean, its still best effort here. Checking at the last second still means the live state can change before you put the label on. We're reducing the likelihood

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I suppose that's true. We can add a function that makes sure everything is still in a valid state, ie we shouldn't have the before-reboot label if reboot-needed=false. That would bring checkBeforeReboot more in line with checkAfterReboot, which would be nice.

// annotations are set. if all annotations are set to true then remove the
// before-reboot=true label and set reboot=ok=true, telling the agent it's
// time to reboot.
glog.V(4).Info("Checking for existence of configured before-reboot annotations")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we check they're true, not that they exist

chosenNodes := make([]*v1api.Node, 0, remainingRebootableCount)
for i := 0; i < remainingRebootableCount && i < len(rebootableNodes); i++ {
chosenNodes = append(chosenNodes, &rebootableNodes[i])
}

// set before=reboot=true for the chosen nodes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit = -> -

// If there is an error getting the list of nodes or updating any of them, an
// error is immediately returned.
func (k *Kontroller) markBeforeReboot() error {
nodelist, err := k.nc.List(v1meta.ListOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be better if these accepted the node list instead of each function doing its own fetch each time. Seems required right now since you're re-fetching 4 times to get the changes at the 4 stages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some ideas around that, but I think that for now we should just keep the list call in this function and maybe change it later.

for _, n := range chosenNodes {
err = k.mark(n.Name, constants.LabelBeforeReboot, k.beforeRebootAnnotations)
if err != nil {
return fmt.Errorf("Failed to mark node for before reboot checks: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something more like the others: "Failed to annotate node %q with %s", node.Name, "before-reboot=true"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just to give context of if it's doing the before-reboot or after-reboot checks. That information is provided in the error returned by mark. I'll make sure the error in mark includes the label name though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just be consistent in the logs and say "annotating node" or "labeling node". I'm mainly objecting to "mark" since it doesn't mean anything to a Kubernetes user

if !wantsRebootSelector.Matches(fields.Set(node.Annotations)) {
glog.Warningf("Node %v no longer wanted to reboot while we were trying to mark it so: %v", node.Name, node.Annotations)
return
func (k *Kontroller) mark(n string, label string, annotations []string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this take the node v1api.Node instead of n string please or change n to nodeName

return
}

glog.V(6).Infof("Found nodes: %+v", nodelist.Items)
// find nodes which just rebooted but haven't run after-reboot checks.
// remove any old annotations and add the after-reboot=true label.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"any old annotations" -> before-reboot annotations

Replace throughout

glog.V(6).Infof("Found nodes: %+v", nodelist.Items)
// find nodes which just rebooted but haven't run after-reboot checks.
// remove any old annotations and add the after-reboot=true label.
glog.V(4).Info("Marking rebooted nodes with after-reboot label")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we consistently say "Labeling ..." or "Annotating ..." in log messages?

}

// for all the nodes which just rebooted, remove any old annotations and add the after-reboot=true label
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by any old annotations, you mean after reboot annotations from a prior post reboot window?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the idea we call one cleanupAnnotations method any place in code we remove the before-reboot or after-reboot label? That way the user's nodes don't have annotations lying around once the node is happily in steady state

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and sure, you could still call cleanup here to be defensive and not assume they were cleanup on the prior run

glog.V(4).Infof("Deleting label %q for %q", constants.LabelAfterReboot, n.Name)
glog.V(4).Infof("Setting annotation %q to false for %q", constants.AnnotationOkToReboot, n.Name)
err = k8sutil.UpdateNodeRetry(k.nc, n.Name, func(node *v1api.Node) {
delete(node.Labels, constants.LabelAfterReboot)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, where do we wanna cleanup the after-reboot annotations? I'm missing that part

return fmt.Errorf("Failed to mark node for before reboot checks: %v", err)
}
glog.Infof("Waiting for configured before-reboot annotations on node %q", n.Name)
glog.V(4).Infof("Waiting for the following annotations: %v", k.beforeRebootAnnotations)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log line doesn't make sense when people aren't using this feature:

Waiting for configured before-reboot annotations on node 10.132.109.209

beforeRebootNodes := k8sutil.FilterNodesByRequirement(nodelist.Items, beforeRebootReq)
rebootingNodes = append(rebootingNodes, beforeRebootNodes...)
afterRebootNodes := k8sutil.FilterNodesByRequirement(nodelist.Items, afterRebootReq)
rebootingNodes = append(rebootingNodes, afterRebootNodes...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to the logic here seem to cause a node to get stuck being considered to be rebooting. Investigating.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I0803 22:04:43.194547       1 operator.go:515] Found 1 rebooted nodes, setting label "container-linux-update.v1.coreos.com/after-reboot" to "true"
I0803 22:04:43.301646       1 operator.go:524] Waiting for configured after-reboot annotations on node "10.132.109.209"
I0803 22:04:43.696275       1 operator.go:457] Found node "10.132.109.209" still rebooting, waiting
I0803 22:04:43.696338       1 operator.go:459] Found 1 (of max 1) rebooting nodes; waiting for completion

Actually, I think this is just the oddity of the logging.

Copy link
Member

@dghubble dghubble Aug 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a new issue, but we should indicate the lack of rebooting nodes if we're going to info log the count of rebooting nodes. It gives the appearance of being stuck at 1.

else {
    glog.Infof("Found no rebooting nodes")
}

or perhaps log to indicate the next step has begun instead

glog.Infof("Finding nodes indicating a reboot is needed")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just going to remove the if statement entirely so that it always logs how many rebooted nodes it found. That's the same thing that mark before reboot does too.

if err != nil {
return fmt.Errorf("Failed to mark node for before reboot checks: %v", err)
}
glog.Infof("Waiting for configured before-reboot annotations on node %q", n.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just merge these two log lines and list the annotations in the normal case. Also, you can drop "configured", its assumed

delete(node.Labels, constants.LabelAfterReboot)
// cleanup the after-reboot annotations
for _, annotation := range k.afterRebootAnnotations {
delete(node.Annotations, annotation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verbose log these deletions

delete(node.Labels, constants.LabelBeforeReboot)
// cleanup the before-reboot annotations
for _, annotation := range k.beforeRebootAnnotations {
delete(node.Annotations, annotation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verbose log these deletions

Copy link
Member

@dghubble dghubble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this out manually annotating pods and tested the default behaviors without this feature. Logic looks good, I thinks its just usability nits now

@dghubble dghubble changed the title update-operator: added pre- and post-reboot actions update-operator: Require before and after reboot annotations Aug 3, 2017
@sdemos sdemos force-pushed the kube-update-hooks branch 2 times, most recently from 350e022 to 3015570 Compare August 5, 2017 00:24
This lets us more easily verify each individual step, and the ordering of the
steps as a whole unit. It also makes it easier to add steps to the process loop
also renamed it from MustRequirement and changed the signature to
encapsulate additional call to NewRequirement.
This commit adds pre- and post-reboot actions.

With these changes, you will be able to configure actions which get run before a
reboot is initiated and after one is completed, effectively gating it on either
side. The actions taken are entirely arbitrary, and software that implements
them only needs to be able to manipulate kubernetes annotations and read labels.

The changes ended up being a little more of an overhaul then I originally
intended, but I'm happy with the outcome.
@sdemos
Copy link
Contributor Author

sdemos commented Aug 5, 2017

fixed usability nits and rebased onto master. @dghubble

@dghubble dghubble self-requested a review August 5, 2017 00:28
@sdemos sdemos merged commit 6af9aae into coreos:master Aug 7, 2017
@sdemos sdemos deleted the kube-update-hooks branch August 7, 2017 20:44
@dghubble dghubble mentioned this pull request Aug 8, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants