MCO-811: MachineConfigNode introduction for MCO State Reporting #1490

cdoern · 2023-10-06T18:13:35Z

This is an enhancement discussing the MachineConfigNode introduction for increased MCO state reporting.

See implementation here:
openshift/machine-config-operator#4012
openshift/api#1596
openshift/api#1621

openshift-ci-robot · 2023-10-06T18:13:38Z

@cdoern: This pull request references MCO-811 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

Introduce an enhancement discussing the MachineConfigState introduction as well as increased State reporting in the MachineConfigutationStatus operator object.

See implementation here:
openshift/machine-config-operator#3942
openshift/api#1596

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2023-10-06T18:13:39Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

enhancements/machine-config/mco-state-reporting.md

openshift-ci-robot · 2023-10-10T15:24:36Z

@cdoern: This pull request references MCO-811 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

This is an enhancement discussing the MachineConfigState introduction as well as increased State reporting in the MachineConfigutationStatus operator object.

See implementation here:
openshift/machine-config-operator#3942
openshift/api#1596

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2023-10-12T18:32:32Z

@cdoern: This pull request references MCO-811 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

This is an enhancement discussing the MachineConfigState introduction as well as increased State reporting in the MachineConfigutationStatus operator object.

See implementation here:
openshift/machine-config-operator#3970
openshift/api#1596

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2023-10-12T19:28:56Z

@cdoern: This pull request references MCO-811 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

This is an enhancement discussing the MachineConfigState introduction as well as increased State reporting in the MachineConfigutationStatus operator object.

See implementation here:
openshift/machine-config-operator#3970
openshift/api#1596
openshift/api#1621

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

enhancements/machine-config/mco-state-reporting.md

openshift-ci-robot · 2023-10-17T15:17:38Z

@cdoern: This pull request references MCO-811 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

This is an enhancement discussing the MachineConfigNode introduction as well as increased State reporting in the MachineConfigutationStatus operator object.

See implementation here:
openshift/machine-config-operator#3970
openshift/api#1596
openshift/api#1621

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

enhancements/machine-config/mco-state-reporting.md

cdoern · 2023-10-24T14:45:29Z

A few comments on the status so far. I will be moving the MachineConfiguration operator/v1 reporting to another enhancement since that is quite separate from machineconfignode.

@JoelSpeed I think we have differing views on how the False Conditions should look. I was thinking of keeping the conditions as is and just setting the status to false, that is how the MCP seems to do it.

I can consider changing the message to something like Completed Drain. Desired Drainer was: but I would like to keep the Reason since otherwise I would need to create 2x the amount of StateProgress's to track the negative states as well.

openshift-ci-robot · 2023-10-24T14:50:10Z

@cdoern: This pull request references MCO-811 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

This is an enhancement discussing the MachineConfigNode introduction for increases MCO state reporting.

See implementation here:
openshift/machine-config-operator#3970
openshift/api#1596
openshift/api#1621

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2023-10-24T14:57:50Z

@cdoern: This pull request references MCO-811 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

This is an enhancement discussing the MachineConfigNode introduction for increased MCO state reporting.

See implementation here:
openshift/machine-config-operator#3970
openshift/api#1596
openshift/api#1621

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

enhancements/machine-config/machine-config-node.md

sinnykumari · 2023-10-25T10:04:19Z

enhancements/machine-config/machine-config-node.md

+
+
+```console
+$ oc get machineconfignodes


I think when an upgrade gets triggered, all the nodes need to be updated to latest config. Hence, reporting UPDATED as false for all nodes would be more aligned. Once a node has been updated, they get set to true

Alternatively, maybe instead of updating, what we really want is idle? Then the user can still gauge updated status via MCP and use this to see what nodes are in action

I'm open to Idle, that seems like it could be helpful. What would trigger Idle to go into false?

I had expected, kind of as @sinnykumari mentions

Name. Updated. Updating my-node-1. False. True my-node-2. False. False

Then the conditions when you expand it would say on my-node-2, not yet queued for update or something to explain why they aren't updating.

Was this resolved?

I can make a card to track an idle state, though that will cause unnecessary API calls that might have a "blip" effect where Updated only is true for a fraction of a second. Might instead want to switch updated to idle

enhancements/machine-config/machine-config-node.md

JoelSpeed

@cdoern Please open the diff view on the PR and make sure to respond to/resolve any open conversations, I can see several that have not had any response, or where the feedback has not made its way to the document itself.

As far as I understand it, the EP is up to date now with the state of the TP feature in 4.15. Before we merge, I'd like any outstanding points or feedback that you intend to fix before GA to be listed somewhere, I've suggested a couple in the EP, that could be a good place for it.

I think once we are tracking a list of TODOs and the existing feedback has been actioned or added to the TODOs, then we can merge this and start looking at the improvements towards GA

enhancements/machine-config/machine-config-node.md

JoelSpeed · 2024-01-19T15:29:47Z

enhancements/machine-config/machine-config-node.md

+Implementation wise, there has been discussion of who should own these objects: the MCD or a new Controller called the MachineStateController. While either approach would work, there are positive and negative impacts to each.
+
+
+## MachineConfigDaemon Approach


Which approach did we end up with? Which controller is owning/updating the MCN?

the Daemon is. I'll add this in.

If this is the approach we are going with, it should probably be in the proposal section rather than alternatives, and the original proposal should be written down as an alternative

enhancements/machine-config/machine-config-node.md

JoelSpeed

Once we have cards for the follow up, and link those in the TP -> GA, I think we can merge this. When the work to action the TP -> GA goes through, it would be good to get the EP updated as well

enhancements/machine-config/machine-config-node.md

JoelSpeed · 2024-02-01T17:31:19Z

enhancements/machine-config/machine-config-node.md

+Implementation wise, there has been discussion of who should own these objects: the MCD or a new Controller called the MachineStateController. While either approach would work, there are positive and negative impacts to each.
+
+
+## MachineConfigDaemon Approach


If this is the approach we are going with, it should probably be in the proposal section rather than alternatives, and the original proposal should be written down as an alternative

enhancements/machine-config/machine-config-node.md

JoelSpeed · 2024-02-01T17:34:06Z

enhancements/machine-config/machine-config-node.md

+The states are mostly in the past tense. This is because processes like `Drained` are defined by completion primarily not the progress. So a user will have `UpdateExecuted` == Unknown and `Drained` == Unkown until the Drain actually completes.
+However, the unknown phase will be accompanied by a message for how the drain is currently going or if the drain has gone wrong.
+
+The condition transitions back to False once the update process is completed. Once Updated == true is the case, all previous states get set to false and their reasons/messages show the fact that this was the message from the previous update cycle.


Yeah I think it would be good to make sure at the beginning of an update, everything is cleared down, and then marked complete as you go through each phase.
Might need to consider what happens when some updates don't pass through all phases though

cdoern · 2024-02-01T21:21:03Z

@JoelSpeed cards here:

https://issues.redhat.com/browse/MCO-1022
https://issues.redhat.com/browse/MCO-1023
https://issues.redhat.com/browse/MCO-1024
https://issues.redhat.com/browse/MCO-1025

dhellmann · 2024-02-13T16:31:14Z

#1555 is changing the enhancement template in a way that will cause the header check in the linter job to fail for existing PRs. If this PR is merged within the development period for 4.16 you may override the linter if the only failures are caused by issues with the headers (please make sure the markdown formatting is correct). If this PR is not merged before 4.16 development closes, please update the enhancement to conform to the new template.

cdoern · 2024-02-14T17:17:40Z

/override markdownlint

cdoern · 2024-02-14T17:17:55Z

/override ci/prow/markdownlint

openshift-ci · 2024-02-14T17:33:10Z

@cdoern: cdoern unauthorized: /override is restricted to Repo administrators, approvers in top level OWNERS file, and the following github teams:openshift: openshift-release-oversight openshift-staff-engineers.

In response to this:

/override ci/prow/markdownlint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2024-02-14T17:33:11Z

@cdoern: cdoern unauthorized: /override is restricted to Repo administrators, approvers in top level OWNERS file, and the following github teams:openshift: openshift-release-oversight openshift-staff-engineers.

In response to this:

/override markdownlint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

dhellmann · 2024-02-14T18:06:32Z

/override ci/prow/markdownlint

@cdoern when it comes time to override again after it's approved, your team lead or staff engineer can override the job for you.

openshift-ci · 2024-02-14T18:07:54Z

@dhellmann: Overrode contexts on behalf of dhellmann: ci/prow/markdownlint

In response to this:

/override ci/prow/markdownlint

@cdoern when it comes time to override again after it's approved, your team lead or staff engineer can override the job for you.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

JoelSpeed

I'm happy with the content of the proposal. It reflects my understanding of the current implementation, and while there is some outstanding feedback, particularly about the UX of the feature, we now have cards in Jira to track those.

There's just one dissonance at the moment, the Alternatives section talks about two ways to implement the mechanics of the proposal, I would have expected one of these to be in the proposal section so it's clear which was chosen and implemented.

If you want to fix that and then poke @sinnykumari adding the final acks to make this merge, I think it's time

cdoern · 2024-02-21T21:05:44Z

also @sinnykumari you need to override lint using /override ci/prow/markdownlint

sinnykumari

Nice work Charlie and everyone for the review. This has turned out great!
Before adding approval label, have one clarifying question.
Today, cluster admin looks at MCP status to see overall state. With MCN in place, there is chance that admin would be confused that why both differs. For situation like power outage, manual node cordoning, SRIOV operator performing some actions. Since MCN is more tied to MCO operation compared to MCP showing overall health of nodes in the cluster, we may want to call this out explicitly in the enhancement perhaps in Summary or any better place?

Introduce an enhancement discussing the MachineConfigNode introduction Signed-off-by: Charlie Doern <cdoern@redhat.com>

cdoern · 2024-02-22T15:39:18Z

@sinnykumari I just added a statement to that effect.

Though, I will push back a bit since in an outage, finding out why your nodes failed to update is equally as useful as checking the MCP status. Sure, there are helpful internally looking at MCO procedures but at the end of the day they are a way to track node update status and since the MCO owns the update code, it just so happens that a lot of these actions are tied to the MCO.

sinnykumari · 2024-02-22T15:52:17Z

Thanks for updating with needed changes.
/approve
/override ci/prow/markdownlint

sinnykumari · 2024-02-22T15:54:05Z

Since there are no outstanding comment, adding lgtm as well for the merge.
/lgtm

openshift-ci · 2024-02-22T15:56:44Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sinnykumari, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [sinnykumari]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2024-02-22T16:11:04Z

@sinnykumari: Overrode contexts on behalf of sinnykumari: ci/prow/markdownlint

In response to this:

Thanks for updating with needed changes.
/approve
/override ci/prow/markdownlint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2024-02-22T16:11:50Z

@cdoern: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 6, 2023

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 6, 2023

cdoern force-pushed the machineconfigstate branch 2 times, most recently from 4a1237e to df1e756 Compare October 9, 2023 20:27

sinnykumari reviewed Oct 10, 2023

View reviewed changes

enhancements/machine-config/mco-state-reporting.md Outdated Show resolved Hide resolved

sinnykumari reviewed Oct 10, 2023

View reviewed changes

cdoern force-pushed the machineconfigstate branch from df1e756 to 1facaff Compare October 12, 2023 17:31

cdoern marked this pull request as ready for review October 13, 2023 00:12

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 13, 2023

cdoern changed the title ~~MCO-811: [WIP] MCO State Reporting~~ MCO-811: MCO State Reporting Oct 13, 2023

openshift-ci bot requested review from jan--f and knobunc October 13, 2023 00:12

MarSik reviewed Oct 16, 2023

View reviewed changes

enhancements/machine-config/mco-state-reporting.md Outdated Show resolved Hide resolved

cdoern force-pushed the machineconfigstate branch from 1facaff to 0ef72be Compare October 17, 2023 15:17

JoelSpeed reviewed Oct 20, 2023

View reviewed changes

cdoern force-pushed the machineconfigstate branch from 0ef72be to 8ab3b20 Compare October 24, 2023 14:49

cdoern changed the title ~~MCO-811: MCO State Reporting~~ MCO-811: MachineConfigNode introduction for MCO State Reporting Oct 24, 2023

sinnykumari reviewed Oct 25, 2023

View reviewed changes

enhancements/machine-config/machine-config-node.md Outdated Show resolved Hide resolved

sinnykumari reviewed Oct 25, 2023

View reviewed changes

LalatenduMohanty reviewed Oct 25, 2023

View reviewed changes

enhancements/machine-config/machine-config-node.md Show resolved Hide resolved

LalatenduMohanty reviewed Oct 25, 2023

View reviewed changes

enhancements/machine-config/machine-config-node.md Outdated Show resolved Hide resolved

cdoern force-pushed the machineconfigstate branch 2 times, most recently from f61c40f to f933511 Compare January 12, 2024 17:16

JoelSpeed reviewed Jan 19, 2024

View reviewed changes

cdoern force-pushed the machineconfigstate branch from f933511 to 8e162e6 Compare January 24, 2024 00:08

JoelSpeed reviewed Feb 1, 2024

View reviewed changes

cdoern force-pushed the machineconfigstate branch from 8e162e6 to b7d3e29 Compare February 1, 2024 21:22

JoelSpeed reviewed Feb 15, 2024

View reviewed changes

cdoern force-pushed the machineconfigstate branch from b7d3e29 to cbc7f9a Compare February 21, 2024 20:19

sinnykumari reviewed Feb 22, 2024

View reviewed changes

MCO State Reporting

7f02b07

Introduce an enhancement discussing the MachineConfigNode introduction Signed-off-by: Charlie Doern <cdoern@redhat.com>

cdoern force-pushed the machineconfigstate branch from cbc7f9a to 7f02b07 Compare February 22, 2024 15:39

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 22, 2024

openshift-ci bot assigned sinnykumari Feb 22, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 22, 2024

openshift-merge-bot bot merged commit 4d28317 into openshift:master Feb 22, 2024
2 checks passed

		Implementation wise, there has been discussion of who should own these objects: the MCD or a new Controller called the MachineStateController. While either approach would work, there are positive and negative impacts to each.


		## MachineConfigDaemon Approach

MCO-811: MachineConfigNode introduction for MCO State Reporting #1490

MCO-811: MachineConfigNode introduction for MCO State Reporting #1490

Conversation

cdoern commented Oct 6, 2023 • edited Loading

openshift-ci-robot commented Oct 6, 2023 • edited by openshift-ci bot Loading

openshift-ci bot commented Oct 6, 2023

openshift-ci-robot commented Oct 10, 2023 • edited by openshift-ci bot Loading

openshift-ci-robot commented Oct 12, 2023 • edited by openshift-ci bot Loading

openshift-ci-robot commented Oct 12, 2023 • edited by openshift-ci bot Loading

openshift-ci-robot commented Oct 17, 2023 • edited by openshift-ci bot Loading

cdoern commented Oct 24, 2023

openshift-ci-robot commented Oct 24, 2023 • edited by openshift-ci bot Loading

openshift-ci-robot commented Oct 24, 2023 • edited by openshift-ci bot Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoelSpeed Oct 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoelSpeed left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoelSpeed left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cdoern commented Feb 1, 2024

dhellmann commented Feb 13, 2024

cdoern commented Feb 14, 2024

cdoern commented Feb 14, 2024

openshift-ci bot commented Feb 14, 2024

openshift-ci bot commented Feb 14, 2024

dhellmann commented Feb 14, 2024

openshift-ci bot commented Feb 14, 2024

JoelSpeed left a comment

Choose a reason for hiding this comment

cdoern commented Feb 21, 2024

sinnykumari left a comment

Choose a reason for hiding this comment

cdoern commented Feb 22, 2024

sinnykumari commented Feb 22, 2024

sinnykumari commented Feb 22, 2024

openshift-ci bot commented Feb 22, 2024

openshift-ci bot commented Feb 22, 2024

openshift-ci bot commented Feb 22, 2024

cdoern commented Oct 6, 2023 •

edited

Loading

openshift-ci-robot commented Oct 6, 2023 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Oct 10, 2023 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Oct 12, 2023 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Oct 12, 2023 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Oct 17, 2023 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Oct 24, 2023 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Oct 24, 2023 •

edited by openshift-ci bot

Loading

JoelSpeed Oct 26, 2023 •

edited

Loading