Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNM] PoC for updating boot images #3980

Closed
wants to merge 4 commits into from

Conversation

djoshy
Copy link
Contributor

@djoshy djoshy commented Oct 16, 2023

This is purely for PoC purposes and should not merge.

This is a rough PoC for updating boot images that run on GCP clusters. See full enhancement here for more background.

What does it do?

Adds a new subcontroller(machine_set_controller) to the MCC, that

  • Listens on machineset changes or coreos-bootimages configmap changes
  • Determines architecture and infra of cluster to index the correct boot image
  • Updates machineset(s) with newer bootimages if there is a delta, and atleast 1 master node has completed the current upgrade
  • Patches machine set via machineclient

What does it not do?

  • work on infra that is not GCP
  • update the ignition stub to spec 3

How do I test this?

  • Launch a cluster on 4.14
  • Upgrade to this PR, this should update the golden configmap, triggering a machineset reconcile loop.
  • Observe logs on the MCC with this command:
oc logs -n openshift-machine-config-operator -f "$(oc get pod -o name -l='k8s-app=machine-config-controller' -n openshift-machine-config-operator)" | grep "machine_set_controller"

You should see all the machinesets being patched, once a master node has completed the upgrade:

I1003 19:19:27.115622       1 machine_set_controller.go:470] Reconciling machineset djoshy10-8kt7j-worker-a on GCP, with arch x86_64
I1003 19:19:27.116922       1 machine_set_controller.go:496] New target boot image: projects/rhcos-cloud/global/images/rhcos-415-92-202309142014-0-gcp-x86-64
I1003 19:19:27.117007       1 machine_set_controller.go:497] Current image: projects/rhcos-cloud/global/images/rhcos-415-92-202309142014-0-gcp
I1003 19:19:27.117095       1 machine_set_controller.go:324] Patching machineset djoshy10-8kt7j-worker-a
I1003 19:19:27.137769       1 machine_set_controller.go:209] MachineSet djoshy10-8kt7j-worker-a updated, reconciling
I1003 19:19:27.137850       1 machine_set_controller.go:470] Reconciling machineset djoshy10-8kt7j-worker-a on GCP, with arch x86_64
I1003 19:19:27.139052       1 machine_set_controller.go:342] No patching required for machineset djoshy10-8kt7j-worker-a
  • Scale up a new node on any of the machinesets
oc scale --replicas=2 machineset <machineset> -n openshift-machine-api
  • Once the node is up and in the worker pool, observe the MCD logs of the newly scaled up node. At the daemon startup, the aleph version(aka the first image the machine started from) logged here should be 4.15 and not 4.14.
I1013 18:27:24.170799    2324 simple_featuregate_reader.go:171] Starting feature-gate-detector
I1013 18:27:24.174987    2324 rpm-ostree.go:308] Running captured: rpm-ostree status
I1013 18:27:24.296615    2324 daemon.go:1480] State: idle
Deployments:
* ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f3e9ff6381fc05d27a79dbcacc7a4270843d1e2e054d21a76885eb57ea3330e1
                   Digest: sha256:f3e9ff6381fc05d27a79dbcacc7a4270843d1e2e054d21a76885eb57ea3330e1
                  Version: 415.92.202310070752-0 (2023-10-13T18:26:10Z)

  233c758390b0f21bf7170c427641782c38f67ead3d2a0d5a3a77e4a7d62e9a02
                  Version: 415.92.202309142014-0 (2023-09-14T20:17:52Z)
I1013 18:27:24.297770    2324 coreos.go:54] CoreOS aleph version: mtime=2023-09-14 20:21:26.864 +0000 UTC build=415.92.202309142014-0 imgid=rhcos-415.92.202309142014-0-qemu.x86_64.qcow2
I1013 18:27:24.297955    2324 coreos.go:71] Ignition provisioning: time=2023-10-13T18:24:59Z

Please post any feedback and questions to the enhancement linked above. Again, this is a rough PoC and should not merge in its current state.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 16, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 16, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 16, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: djoshy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 16, 2023
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 20, 2023
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 27, 2023
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 13, 2023
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 13, 2023
This vendors in:
- informers and listers for the machine api objects
- stream-metadata-go for coreos stream objects
@djoshy
Copy link
Contributor Author

djoshy commented Jan 9, 2024

Just wanted to mention the MVP for this, #4083 has merged, if anyone finds themselves here (:

@djoshy djoshy deleted the boot-images-poc branch February 12, 2024 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants