Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCO-821: Introduce MachineConfigNode types #1596

Merged
merged 1 commit into from
Nov 1, 2023

Conversation

cdoern
Copy link
Contributor

@cdoern cdoern commented Sep 21, 2023

For Openshift 4.15, the MCO is working on a new state reporting mechanism: The MachineConfigStateController. The main type behind this architecture is the MachineConfigNode. A MachineConfigNode tracks an upgrade progression that relates behaviors between a node, its machineconfigpool, and the MCD processes within the MCO.

Each of these will have its own CR in the MCO which necessitates a CRD that I have also included in this PR.

where doing oc describe machineconfignode/<name> will of course display more in depth information of the type metav1.Condition. These conditions are composed of a state, phase, reason, and time. The state refers to what part of the upgrade we are in (UpdatePreparing, Completing etc), Phase is a in depth description of what part of a State we are in, and the Reason is a quick one or two word statement describing what "Stopping Point" we are at: ReconcilingConfigs, StoppingCfgDrift. The following accepted Condition types are:

	// MachineConfigNodeUpdatePrepared describes a machine that is preparing in the daemon to trigger an update
	MachineConfigNodeUpdatePrepared StateProgress = "UpdatePrepared"
	// MachineConfigNodeUpdateExecuted describes a machine that has executed the body of the upgrade
	MachineConfigNodeUpdateExecuted StateProgress = "UpdateExecuted"
	// MachineConfigNodeUpdatePostActionComplete describes a machine that has executed its post update action
	MachineConfigNodeUpdatePostActionComplete StateProgress = "UpdatePostActionComplete"
	// MachineConfigNodeUpdateComplete describes a machine that has completed the core parts of an upgrade.
	MachineConfigNodeUpdateComplete StateProgress = "UpdateComplete"
	// MachineConfigNodeUpdated describes a machine that has a matching desired and current config after executing an update
	MachineConfigNodeUpdated StateProgress = "Updated"
	// MachineConfigNodeUpdateResumed describes a machine that has resumed normal processes
	MachineConfigNodeResumed StateProgress = "Resumed"
	// MachineConfigNodeUpdateCompatible the part of the preparing phase where the mco decides whether it can update
	MachineConfigNodeUpdateCompatible StateProgress = "UpdateCompatible"
	// MachineConfigNodeUpdateDrained describes the part of the inprogress phase where the node drains
	MachineConfigNodeUpdateDrained StateProgress = "Drained"
	// MachineConfigNodeUpdateFilesAndOS describes the part of the inprogress phase where the nodes file and OS config change
	MachineConfigNodeUpdateFilesAndOS StateProgress = "AppliedFilesAndOS"
	// MachineConfigNodeUpdateCordoned describes the part of the completing phase where the node cordons
	MachineConfigNodeUpdateCordoned StateProgress = "Cordoned"
	// MachineConfigNodeUpdateRebooted describes the part of the post action phase where the node reboots itself
	MachineConfigNodeUpdateRebooted StateProgress = "RebootedNode"
	// MachineConfigNodeUpdateReloaded describes the part of the post action phase where the node reloads its CRIO service
	MachineConfigNodeUpdateReloaded StateProgress = "ReloadedCRIO"

The standard oc get machineconfignode/... will display only Updated UpdatePreparing UpdateInProgress UpdatePostAction UpdateCompleting and Resuming. The rest of these granular states will be shown via -o wide or oc describe

An example of oc get machineconfignodes would look like:

mcnodes

The other information accessible via oc describe will be:

  1. the last instance of each Condition type
  2. the generation of this MachineConfigNode
  3. the pool that this node is tied to
  4. the desired and current machineconfigs for the node

this is an example of the above oc describe command that is not fully operational yet.

mcnodedescribe

As well as this information located at the bottom of oc describe

extra_info

This feature is linked to https://issues.redhat.com/browse/MCO-452 and more detail can be found there.

These types are basically an attempt to aggregate and display the state of a cluster's machines in a user friendly or at least digestible way. Each Condition is also associated closely with a StateProgress. All of these StateProgresses are phases that I have broken down key MCO operations into.

The structure and types of this PR are being used in: openshift/machine-config-operator#3970 which is working quite well.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 21, 2023

Hello @cdoern! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Sep 21, 2023
@cdoern cdoern force-pushed the machine-state branch 11 times, most recently from 000e721 to d5de927 Compare September 25, 2023 20:25

// MachineState describes the health of the Machines on the system
//
// Compatibility level 1: Stable within a major release for a minimum of 12 months or 3 minor releases (whichever is longer).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

born into compat-level 1 is very confident. Are we already that confident that we are getting these types right, and won't need tough follow-up changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what the protocol is here. I was under the impression that anything user facing should be level 1?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to use a feature gate for this, which I think we should try, then you want to make this a v1alpha1 API first, at compatibility level 4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am planning on featuregating, do you think I need to introduce that in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do the feature gate as a separate PR, that's a pretty easy thing to add so can probably get that in and merged before the API

@petr-muller
Copy link
Member

/cc

@JoelSpeed
Copy link
Contributor

/lgtm
/hold

I would like some acknowledgement from others on the EP that they are happy for the API to merge even though the EP still has some details to iron out, have posted a comment over on the EP openshift/enhancements#1490 (comment)

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 31, 2023
@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 31, 2023
@cdoern
Copy link
Contributor Author

cdoern commented Oct 31, 2023

/test e2e-upgrade

@cdoern cdoern force-pushed the machine-state branch 2 times, most recently from 728764d to ea94212 Compare October 31, 2023 16:15
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 31, 2023
@JoelSpeed
Copy link
Contributor

/approve cancel

I think the description of the spec and status desired versions are the wrong way around, can we check

@openshift-ci openshift-ci bot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 31, 2023
Signed-off-by: Charlie Doern <cdoern@redhat.com>
@JoelSpeed
Copy link
Contributor

/lgtm

Copy link
Contributor

openshift-ci bot commented Oct 31, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cdoern, JoelSpeed, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. labels Oct 31, 2023
@JoelSpeed
Copy link
Contributor

/test all

@JoelSpeed
Copy link
Contributor

/hold cancel

We have enough consensus to get this in, and iterate with what's left of 4.15

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 31, 2023
@cdoern
Copy link
Contributor Author

cdoern commented Oct 31, 2023

/test integration

@cdoern
Copy link
Contributor Author

cdoern commented Nov 1, 2023

/test e2e-aws-serial-techpreview

Copy link
Contributor

openshift-ci bot commented Nov 1, 2023

@cdoern: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci openshift-ci bot merged commit 3680e21 into openshift:master Nov 1, 2023
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants