-
Notifications
You must be signed in to change notification settings - Fork 464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split RHCOS into layers #1637
base: master
Are you sure you want to change the base?
Split RHCOS into layers #1637
Conversation
Skipping CI for Draft Pull Request. |
067ece5
to
f79684b
Compare
reviewers: | ||
- "@patrickdillon, for installer impact" | ||
- "@rphillips, for node impact" | ||
- "@joepvd, for ART impact" | ||
- "@sinnykumari, for MCO impact" | ||
- "@LorbusChris, for OKD impact" | ||
- "@zaneb, for agent installer impact" | ||
- "@sdodson, for overall architecture" | ||
- "@cgwalters, for overall architecture" | ||
approvers: | ||
- "@mrunalp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently, the bot won't automatically tag the folks listed here, so manually doing it: @patrickdillon @rphillips @joepvd @sinnykumari @LorbusChris @zaneb @sdodson @cgwalters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work on this!
openshift/kubernetes has a specific workflow where jobs will build a new kubelet to use during the job run. This helps with rebase work and validating new kubernetes versions coming into OpenShift. We should preserve this workflow when migrating to RHCOS layering. /cc @soltysh |
I don't expect any issues there. That workflow should keep working as is. |
f79684b
to
a6a7438
Compare
/cc @cybertron @andfasano |
I believe this was the pre-req work done in openshift/kubernetes#1805, which ensured we won't have problems in o/k. |
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be `oc`, `kubelet`, or `crio` binaries for example, which bootstrapping obviously relies on. Instead, now we change things up so that early on when booting the bootstrap node, we pull down the node image, unencapsulate it (this just means convert it back to an OSTree commit), then mount over its `/usr`, and import new `/etc` content. This is done by isolating to a different systemd target to only bring up the minimum number of services to do the pivot and then carry on with bootstrapping. This does not incur additional reboots and should be compatible with AI/ABI/SNO. But it is of course, a huge conceptual shift in how bootstrapping works. With this, we would now always be sure that we're using the same binaries as the target version as part of bootstrapping, which should alleviate some issues such as AI late-binding (see e.g. https://issues.redhat.com/browse/MGMT-16705). The big exception of course being the kernel. Relatedly, currently `/usr/lib/modules` is also shadowed by the mount, but we could re-mount it if needed. To be conservative, the new logic only kicks in when using bootimages which do not have `oc`. This will allow us to ratchet this in more easily. Down the line, we should be able to replace some of this with `bootc apply-live` once that's available (and also works in a live environment). (See containers/bootc#76.) For full context, see the linked enhancement and discussions there.
OK, so let's resume the bootstrapping issue. Restating some of the things from above and from researching further:
What I'm playing with now is basically to have a special This is in effect like a more aggressive WIP for this in openshift/installer#8742. |
@jlebon That sounds like it might work. Where will the Kubelet be coming from? An OpenShift built image? |
Won't doing |
From the node image (i.e. for OCP, the
No. The system boots into |
Via a generator overriding |
A clarification on this: if we can split out those units somehow so they're accessible in the release payload and not baked into the installer, would whatever generates the ISOs be able to pull from the payload so we don't have to duplicate them across codebases? That'd make the more vanilla (non-AI/ABI) install flows more awkward though with a level of indirection. |
We try to avoid having a hard dependency on the release payload in ABI, because the ISO may be generated in a different environment from where the cluster is going to run. I think it could work to group all of the agent services under e.g. |
Yeah, that's along the lines of what I was thinking as well. I like the I might reach out with questions for getting a test environment up to iterate on this. |
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be `oc`, `kubelet`, or `crio` binaries for example, which bootstrapping obviously relies on. Instead, now we change things up so that early on when booting the bootstrap node, we pull down the node image, unencapsulate it (this just means convert it back to an OSTree commit), then mount over its `/usr`, and import new `/etc` content. This is done by isolating to a different systemd target to only bring up the minimum number of services to do the pivot and then carry on with bootstrapping. This does not incur additional reboots and should be compatible with AI/ABI/SNO. But it is of course, a huge conceptual shift in how bootstrapping works. With this, we would now always be sure that we're using the same binaries as the target version as part of bootstrapping, which should alleviate some issues such as AI late-binding (see e.g. https://issues.redhat.com/browse/MGMT-16705). The big exception of course being the kernel. Relatedly, note we do persist `/usr/lib/modules` from the booted system so that loading kernel modules still works. To be conservative, the new logic only kicks in when using bootimages which do not have `oc`. This will allow us to ratchet this in more easily. Down the line, we should be able to replace some of this with `bootc apply-live` once that's available (and also works in a live environment). (See containers/bootc#76.) For full context, see the linked enhancement and discussions there.
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be `oc`, `kubelet`, or `crio` binaries for example, which bootstrapping obviously relies on. Instead, now we change things up so that early on when booting the bootstrap node, we pull down the node image, unencapsulate it (this just means convert it back to an OSTree commit), then mount over its `/usr`, and import new `/etc` content. This is done by isolating to a different systemd target to only bring up the minimum number of services to do the pivot and then carry on with bootstrapping. This does not incur additional reboots and should be compatible with AI/ABI/SNO. But it is of course, a huge conceptual shift in how bootstrapping works. With this, we would now always be sure that we're using the same binaries as the target version as part of bootstrapping, which should alleviate some issues such as AI late-binding (see e.g. https://issues.redhat.com/browse/MGMT-16705). The big exception of course being the kernel. Relatedly, note we do persist `/usr/lib/modules` from the booted system so that loading kernel modules still works. To be conservative, the new logic only kicks in when using bootimages which do not have `oc`. This will allow us to ratchet this in more easily. Down the line, we should be able to replace some of this with `bootc apply-live` once that's available (and also works in a live environment). (See containers/bootc#76.) For full context, see the linked enhancement and discussions there.
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be oc, kubelet, or crio binaries for example, which bootstrapping obviously relies on. To adapt to this, the OpenShift installer now ships a new `node-image-overlay.service` in its bootstrap Ignition config. This service takes care of pulling down the node image and overlaying it, effectively updating the system to the node image version. Here, we accordingly also adapt assisted-installer so that we run `node-image-overlay.service` before starting e.g. `kubelet.service` and `bootkube.service`. See also: openshift/installer#8742
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be oc, kubelet, or crio binaries for example, which bootstrapping obviously relies on. To adapt to this, the OpenShift installer now ships a new `node-image-overlay.service` in its bootstrap Ignition config. This service takes care of pulling down the node image and overlaying it, effectively updating the system to the node image version. Here, we accordingly also adapt assisted-installer so that we run `node-image-overlay.service` before starting e.g. `kubelet.service` and `bootkube.service`. See also: openshift/installer#8742
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be `oc`, `kubelet`, or `crio` binaries for example, which bootstrapping obviously relies on. Instead, now we change things up so that early on when booting the bootstrap node, we pull down the node image, unencapsulate it (this just means convert it back to an OSTree commit), then mount over its `/usr`, and import new `/etc` content. This is done by isolating to a different systemd target to only bring up the minimum number of services to do the pivot and then carry on with bootstrapping. This does not incur additional reboots and should be compatible with AI/ABI/SNO. But it is of course, a huge conceptual shift in how bootstrapping works. With this, we would now always be sure that we're using the same binaries as the target version as part of bootstrapping, which should alleviate some issues such as AI late-binding (see e.g. https://issues.redhat.com/browse/MGMT-16705). The big exception of course being the kernel. Relatedly, note we do persist `/usr/lib/modules` from the booted system so that loading kernel modules still works. To be conservative, the new logic only kicks in when using bootimages which do not have `oc`. This will allow us to ratchet this in more easily. Down the line, we should be able to replace some of this with `bootc apply-live` once that's available (and also works in a live environment). (See containers/bootc#76.) For full context, see the linked enhancement and discussions there.
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be `oc`, `kubelet`, or `crio` binaries for example, which bootstrapping obviously relies on. Instead, now we change things up so that early on when booting the bootstrap node, we pull down the node image, unencapsulate it (this just means convert it back to an OSTree commit), then mount over its `/usr`, and import new `/etc` content. This is done by isolating to a different systemd target to only bring up the minimum number of services to do the pivot and then carry on with bootstrapping. This does not incur additional reboots and should be compatible with AI/ABI/SNO. But it is of course, a huge conceptual shift in how bootstrapping works. With this, we would now always be sure that we're using the same binaries as the target version as part of bootstrapping, which should alleviate some issues such as AI late-binding (see e.g. https://issues.redhat.com/browse/MGMT-16705). The big exception of course being the kernel. Relatedly, note we do persist `/usr/lib/modules` from the booted system so that loading kernel modules still works. To be conservative, the new logic only kicks in when using bootimages which do not have `oc`. This will allow us to ratchet this in more easily. Down the line, we should be able to replace some of this with `bootc apply-live` once that's available (and also works in a live environment). (See containers/bootc#76.) For full context, see the linked enhancement and discussions there.
a6a7438
to
4e54497
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Updated and officially ready for review!
|
4e54497
to
086ae67
Compare
@omertuc did you see this one? |
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be oc, kubelet, or crio binaries for example, which bootstrapping obviously relies on. To adapt to this, the OpenShift installer now ships a new `node-image-overlay.service` in its bootstrap Ignition config. This service takes care of pulling down the node image and overlaying it, effectively updating the system to the node image version. Here, we accordingly also adapt assisted-installer so that we run `node-image-overlay.service` before starting e.g. `kubelet.service` and `bootkube.service`. See also: openshift/installer#8742
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be `oc`, `kubelet`, or `crio` binaries for example, which bootstrapping obviously relies on. Instead, now we change things up so that early on when booting the bootstrap node, we pull down the node image, unencapsulate it (this just means convert it back to an OSTree commit), then mount over its `/usr`, and import new `/etc` content. This is done by isolating to a different systemd target to only bring up the minimum number of services to do the pivot and then carry on with bootstrapping. This does not incur additional reboots and should be compatible with AI/ABI/SNO. But it is of course, a huge conceptual shift in how bootstrapping works. With this, we would now always be sure that we're using the same binaries as the target version as part of bootstrapping, which should alleviate some issues such as AI late-binding (see e.g. https://issues.redhat.com/browse/MGMT-16705). The big exception of course being the kernel. Relatedly, note we do persist `/usr/lib/modules` from the booted system so that loading kernel modules still works. To be conservative, the new logic only kicks in when using bootimages which do not have `oc`. This will allow us to ratchet this in more easily. Down the line, we should be able to replace some of this with `bootc apply-live` once that's available (and also works in a live environment). (See containers/bootc#76.) For full context, see the linked enhancement and discussions there.
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be `oc`, `kubelet`, or `crio` binaries for example, which bootstrapping obviously relies on. Instead, now we change things up so that early on when booting the bootstrap node, we pull down the node image, unencapsulate it (this just means convert it back to an OSTree commit), then mount over its `/usr`, and import new `/etc` content. This is done by isolating to a different systemd target to only bring up the minimum number of services to do the pivot and then carry on with bootstrapping. This does not incur additional reboots and should be compatible with AI/ABI/SNO. But it is of course, a huge conceptual shift in how bootstrapping works. With this, we would now always be sure that we're using the same binaries as the target version as part of bootstrapping, which should alleviate some issues such as AI late-binding (see e.g. https://issues.redhat.com/browse/MGMT-16705). The big exception of course being the kernel. Relatedly, note we do persist `/usr/lib/modules` from the booted system so that loading kernel modules still works. To be conservative, the new logic only kicks in when using bootimages which do not have `oc`. This will allow us to ratchet this in more easily. Down the line, we should be able to replace some of this with `bootc apply-live` once that's available (and also works in a live environment). (See containers/bootc#76.) For full context, see the linked enhancement and discussions there.
container image, just like most other OpenShift components are built. This should | ||
allow simplifying CI and ART tooling related to RHCOS, much of which is bespoke. | ||
|
||
For example, ART would no longer have to sync RHCOS images to CI; CI could |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this would still occur. CI builds / rebuilds images after a change merges. Images in repos that don't have frequent merges (e.g. openshift/images) go stale over time -- which is why pushes updates it builds for these images into the CI integration imagestreams.
For ART to stop doing this, we'd need to know (a) what triggers the RHCOS build in CI (b) what keeps it fresh (c) how it could stay current with RHEL & OCP RPM updates.
The statements which follow remain accurate even if ART continues to build the image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's a detail I didn't know but it makes sense. So overall images in the CI imagestreams may come from either CI or ART nightlies depending on how fresh the repo is?
I'm trying to verify my understanding of this by comparing the image digests in e.g. registry.ci.openshift.org/ocp/release:4.17.0-0.ci-2024-09-24-105044 vs registry.ci.openshift.org/ocp/release:4.17.0-0.nightly-2024-09-23-211417; I would've expected some of them to be the same there. The CI payloads are manifest listed but not the nightly ART one, but even comparing against the amd64 digests in the manifest lists against the nightly ones, I'm not getting any component images that match other than the RHCOS ones.
Hmm, and based on this other comment, are you suggesting not having CI builds promote at all and keep relying on just ART or both? I'd like the CoreOS images to be at parity with the other component images. E.g. merges to openshift/os would ideally trigger a CI build and promotion.
which will build the OpenShift node image by layering the OpenShift components on top: RPM | ||
packages, config files, etc... | ||
|
||
After some testing, the Prow job will be configured to push to the `rhel-coreos` tag in the CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a perfect option to enable effective pre-merge CI for openshift/os, but as discussed above, I'm not sure it should stop ART promotion of the rhcos content to the CI imagestream.
Consider an older z-stream like 4.12. Very few merges to openshift/os in release-4.12 would mean, in effect, no updates the rhcos content in the 4.12 CI imagestream.
An update merging in openshift/oc would not update the client RPM in the CI coreos image until (a) ART rebuilt the client RPM and (b) a subsequent PR merged in openshift/os causing that new RPM to be installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Will update this section to make it match what we want after the conversation above about this is resolved.
Brew and those builds then being added to the ART plashets. To avoid having | ||
to source the (OCP-versioned) ART plashets during the CoreOS builds, we | ||
could instead create new repos for these packages that are clearly defined | ||
to be RHEL-versioned. The CoreOS build could then add that repo alongside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ART would need to dnf upgrade the kernel in the node layer from their plashet. If the coreos layer is up-to-date, it is a no-op. Otherwise, we lose some of the benefit of the layer decoupling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, when would it be not up to date? The ART plashet sources it from the same place the RHCOS image would source it from. It's certainly possible to replace the kernel in the node image build, but IMO it's much more appropriate to respin the RHCOS base image for it.
This enhancement describes improvements to the way RHEL CoreOS (RHCOS) is built so that it will better align with image mode for RHEL, all while also providing benefits on the OpenShift side. Currently, RHCOS is built as a single layer that includes both RHEL and OCP content. This enhancement proposes splitting it into three layers. Going from bottom to top: 1. the (RHEL-versioned) bootc layer (i.e. the base rhel-bootc image shared with image mode for RHEL) 2. the (RHEL-versioned) CoreOS layer (i.e. coreos-installer, ignition, afterburn, scripts, etc...) 3. the (OCP-versioned) node layer (i.e. kubelet, cri-o, etc...) The terms "bootc layer", "CoreOS layer", and "node layer" will be used throughout this enhancement to refer to these. The details of this enhancement focus on doing the first split: creating the node layer as distinct from the CoreOS layer (which will not yet be rebased on top of a bootc layer). The two changes involved which most affect OCP are: 1. bootimages will no longer contain OCP components (e.g. kubelet, cri-o, etc...) 2. the `rhel-coreos` payload image will be built in Prow/Konflux (as any other) Tracked at: https://issues.redhat.com/browse/OCPSTRAT-1190
086ae67
to
02e69db
Compare
@jlebon: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This enhancement describes improvements to the way RHEL CoreOS (RHCOS) is built so that it will better align with image mode for RHEL, all while also providing benefits on the OpenShift side. Currently, RHCOS is built as a single layer that includes both RHEL and OCP content. This enhancement proposes splitting it into three layers. Going from bottom to top:
The terms "bootc layer", "CoreOS layer", and "node layer" will be used throughout this enhancement to refer to these.
The details of this enhancement focus on doing the first split: creating the node layer as distinct from the CoreOS layer (which will not yet be rebased on top of a bootc layer). The two changes involved which most affect OCP are:
rhel-coreos
payload image will be built in Prow/Konflux (as any other)Tracked at: https://issues.redhat.com/browse/OCPSTRAT-1190