From 10efcd2e07f60fb964d989fe5d2dd7b73e851514 Mon Sep 17 00:00:00 2001 From: Fabio Bertinatto Date: Thu, 15 Jun 2023 14:43:36 -0300 Subject: [PATCH] Add a continuous Kubernetes rebase proposal --- dev-guide/kubernetes-continuous-rebase.md | 153 ++++++++++++++++++++++ 1 file changed, 153 insertions(+) create mode 100644 dev-guide/kubernetes-continuous-rebase.md diff --git a/dev-guide/kubernetes-continuous-rebase.md b/dev-guide/kubernetes-continuous-rebase.md new file mode 100644 index 0000000000..a902031e4a --- /dev/null +++ b/dev-guide/kubernetes-continuous-rebase.md @@ -0,0 +1,153 @@ +--- +title: kubernetes-rebase +authors: + - "@fbertina" +reviewers: + - "@soltysh" +approvers: + - "@soltysh" +creation-date: 2023-06-15 +last-updated: 2023-06-15 +--- + +# Kubernetes Continuous Rebase + +## Goal + +The main goal of this proposal is to proactively identify and address +any potential issues that may arise during the upcoming rebase +process. + +The desired outcome is to be able to land the rebase PR significantly +earlier in the process, potentially aligning with the release of the +upstream tag. + +## Proposal + +Currently, the rebase work is typically spread out over a period of 1 +or 2 months. However, it can potentially be distributed throughout the +development cycle of Kubernetes. To achieve this, we could have an OCP +branch with a continually updated Kubernetes codebase, allowing most +of the work to be completed even before the rebase process begins. + +The main approach involves applying our downstream patches against the +upstream master branch on a daily basis. It is expected that some +patches may fail to be applied multiple times during the development +cycle. However, as soon as such failures occur, we will receive +notifications, and the necessary fixes will be applied to the +downstream patches or the upstream code. + +Implementing this approach brings several benefits: + +1. The rebase process becomes less time-sensitive. +2. We receive early signals if an upstream change breaks OCP, enabling + us to address the issue promptly either in the upstream code or on + our side. +3. The rebase PR should be ready to be landed as soon as the upstream + code becomes generally available (GA). + +To implement this proposal, the following steps are required: + +### Watcher + +For each OCP release, we will designate a watcher to participate in +the process. Ideally, it should be the same person who will execute +the final rebase. + +A watcher is responsible for ensuring that the remaining steps +outlined below are executed without errors. + +Although some manual work is required, it should not occupy their +entire daily working time. + +### A -next branch (optional) + +For each of the dependencies listed below, a new branch called +`ocp-next` is created with their Kubernetes dependencies updated: + +* openshift/api +* openshift/client-go +* openshift/library-go +* openshift/apiserver-library-go + +Initially, this can be done manually on a weekly basis. In the future, +certain parts of this process can potentially be automated, requiring +manual intervention only when the automation fails. + +This process should already uncover some future issues, requiring +fixes on unit tests or Makefiles for instance. + +### CI Job + +The goal of the CI job is to detect if our downstream patches create +any code conflicts when applied to the upstream code. In addition to +that, it will uncover potential issues with dependencies and +generated code. + +In short, the new CI job will: + +1. Take a series of downstream patches and apply them against the + upstream code. +2. Pin the dependencies mentioned above to the HEAD of their + respective `ocp-next` branches. +3. Update the auto-generated code and docs (i.e., `make update`). +4. Make sure the codebase is in a sane state by executing automated + verification and testing with `make` (i.e., `test`, `verify`, + `build`, etc.). +5. Commit and push the local changes to an `ocp-next` branch in a + remote repository. +6. Update or create the Pull Request. + +If the job fails to execute any of the steps above, the watcher is +responsible for fixing whatever is preventing the job from +succeeding. Examples of fixes include: + +1. Making a code change to the downstream patch to address a code + conflict. +2. Creating an upstream PR to correct any breaking change. +3. Creating a new downstream patch to rectify an incorrect assumption + in our operators. + +A prototype of this workflow is available +[here](https://github.com/bertinatto/ocp-next/blob/master/next.go). + +### Open Questions + +This proposal assumes that all downstream patches are located in a +specific directory, such as the `patches` directory in [this +prototype](https://github.com/bertinatto/ocp-next/tree/master/patches). + +However, it is unclear how we can ensure that this directory remains +up-to-date with the latest patches imported into our +openshift/kubernetes fork. + +Here are a few potential options to address this issue: + +1. Establish the patches directory as the source of truth for all + downstream patches. This would require teams to ensure that their + patches are imported into this directory whenever they introduce a + new carry patch. It may be beneficial to implement some automation + to streamline this process. +2. Automate the process of listing and applying patches from the git + log, as described + [here](https://github.com/openshift/kubernetes/blob/master/REBASE.openshift.md#creating-a-spreadsheet-of-carry-commits-from-the-previous-release). + In case the automation fails to cherry-pick a specific patch, it + can then search for the patch in the patches directory. This is the + approach taken by the tooling currently under development + [here](https://github.com/soltysh/rebase). + +## Conclusion + +The proposed approach involves establishing an OCP branch with an +updated Kubernetes codebase, daily application of downstream patches, +and the setup of a CI job to detect code conflicts and and failures in +generated code. + +The implementation of this proposal aims to improve the rebase process +and proactively address potential issues the are currently only +detected when the rebase process starts. + +The ultimate goal is to land the rebase PR considerably early, +potentially aligning with the release of the upstream GA tag. This +will allow us to expose updated features and fixes from upstream to +our OCP teams considerably earlier than we do today.