diff --git a/docs/proposals/004-orchestration-refresh-certs.md b/docs/proposals/004-orchestration-refresh-certs.md new file mode 100644 index 00000000..4ccb36f5 --- /dev/null +++ b/docs/proposals/004-orchestration-refresh-certs.md @@ -0,0 +1,266 @@ + + +# Proposal information + + +- **Index**: 004 + + +- **Status**: ACCEPTED + + + +- **Name**: Cluster Orchestration - Certificate Refresh + + +- **Owner**: Mateo Florido [@mateoflorido](https://github.com/mateoflorido) + +# Proposal Details + +## Summary + + +This proposal aims to introduce a mechanism to refresh certificates for all +nodes in a Canonical Kubernetes CAPI cluster simultaneously, removing the +need to annotate each machine individually. This feature will allow the +administrators to trigger a cluster-wide certificate refresh through +annotations on higher-level resources like `Cluster`, `CK8sControlPlane`, or +`MachineDeployment`. + +## Rationale + + +We currently have the ability to refresh the certificates for individual +`Machine` resources in the cluster. However, this process can be time consuming +as it requires annotating each `Machine` resource individually and waiting for +the certificates to refresh. In this proposal, we aim to introduce the +capability to refresh certificates for all nodes in the cluster at once. This +new feature will improve the user experience and speed up the process, +especially in large clusters. + +## User facing changes + + +Administrators can annotate the `Cluster`, `CK8sControlPlane` or +`MachineDeployment` objects to trigger the certificate refresh for machines in the +cluster, control plane nodes, or worker nodes, respectively. + +```yaml +kubectl annotate cluster v1beta2.k8sd.io/refresh-certificates={expires-in} +kubectl annotate ck8scontrolplane v1beta2.k8sd.io/refresh-certificates={expires-in} +kubectl annotate machinedeployment v1beta2.k8sd.io/refresh-certificates={expires-in} +``` + +`expires-in` specifies how long the certificates will be valid. It can be +expressed in years, months, days, or other time units supported by the +`time.ParseDuration` function. + +## Alternative solutions + + +As mentioned in the [Proposal 003], the Kubeadm Control Plane Provider can +refresh the certificates for the nodes in the cluster. However, this approach +requires performing a rolling update of the machines owned by the cluster. + +## Out of scope + + +This proposal does not include the functionality to refresh certificates via +a rolling update of nodes or automatically trigger the process when +certificates are close to expiring. Aditionally, it does not cover +the renewing of external certificates provided by the user or CA certificates. + +# Implementation Details + +## API Changes + + +None + +## Bootstrap Provider Changes + + +### Cluster Controller + +We will add a new controller, `ClusterCertificatesReconciler`, to the bootstrap +provider. This controller will monitor for `Cluster` objects and trigger a +certificate refresh for all nodes in the cluster when the +`v1beta2.k8sd.io/refresh-certificates` annotation is applied. + +The status of the certificate refresh process will be shared via the `Cluster` +object by emitting events. The events that the controller can emit are: +- `RefreshCertsInProgress`: The certificate refresh process has started. +- `RefreshCertsDone`: The certificate refresh process has finished successfully. +- `RefreshCertsFailed`: The certificate refresh process has failed. + +The controller should perform the following steps: +1. Retrieve the `CK8sControlPlane` object owned by the `Cluster` object. +2. Emit the `RefreshCertsInProgress` event for the `Cluster` object. +3. Trigger a certificate refresh for the control plane nodes by annotating the + `CK8sControlPlane` object with the `v1beta2.k8sd.io/refresh-certificates` + annotation. +4. Wait for the certificates to be refreshed on the control plane nodes. The + controller should check the `v1beta2.k8sd.io/refresh-certificates-status` + annotation to determine when the certificates have been refreshed. +5. If the refresh is successful, the controller proceeds to the + `MachineDeployment` objects. +6. For each `MachineDeployment` object, trigger a certificate refresh for the + worker nodes by annotating the `MachineDeployment` object with the + `v1beta2.k8sd.io/refresh-certificates` annotation. +7. Wait for the certificates to be refreshed on the worker nodes, checking the + `v1beta2.k8sd.io/refresh-certificates-status` annotation. +8. If the refresh is successful, the controller emits the `RefreshCertsDone` + event for the `Cluster` object. + +### MachineDeployment Controller + +We also need to add a new controller, `MachineDeployCertificatesReconciler`, to +the bootstrap provider. This controller will watch for `MachineDeployment` +objects and trigger a certificate refresh for all the worker nodes in the +cluster when the `v1beta2.k8sd.io/refresh-certificates` annotation is present. + +The controller should perform the following steps: +1. List all machines owned by the `MachineDeployment` object and filter out the + control plane machines. +2. Emit the `RefreshCertsInProgress` event for the `MachineDeployment` object. +3. For each machine, trigger the certificate refresh by annotating the machine + with the `v1beta2.k8sd.io/refresh-certificates` annotation. +4. Wait for the certificates to be refreshed on that machine. The controller + should check the `v1beta2.k8sd.io/refresh-certificates-status` annotation + to know when the certificates have been refreshed. +5. If the refresh is successful, the controller moves to the next machine. + +The status of the certificate refresh process will be shared via the +`MachineDeployment` object in the same way as the `Cluster` controller. + +## ControlPlane Provider Changes + + +A controller `ControlPlaneCertificatesReconciler` will be added to the control plane +provider. This controller will watch for the `CK8sControlPlane` objects and +will trigger the certificate refresh for all the control plane nodes in the +cluster when the `v1beta2.k8sd.io/refresh-certificates` annotation is present. + +The controller should perform the following steps: +1. List all the control plane machines owned by the `CK8sControlPlane` object. +2. Emit the `RefreshCertsInProgress` event for the `CK8sControlPlane` object. +3. For each control plane machine, trigger the certificate by annotating the + machine with the `v1beta2.k8sd.io/refresh-certificates` annotation. +4. Wait for the certificates to be refreshed in that machine. The controller + should check the `v1beta2.k8sd.io/refresh-certificates-status` + annotation to know when the certificates have been refreshed. +5. If the upgrade is sucessful, the controller moves to the next machine. + If the upgrade fails, the controller emits the `RefreshCertsFailed` event + for the `CK8sControlPlane` object and stops the process. +6. Once all the control plane machines have been refreshed, the controller emits + the `RefreshCertsDone` event for the `CK8sControlPlane` object. + +As mentioned in the Bootstrap Provider Changes, the status of the certificate +refresh process will be shared via the `CK8sControlPlane` object. Using the +same events as the `Cluster` controller. + +## Configuration Changes + + +None + +## Documentation Changes + +This proposal will require a new section in the Canonical Kubernetes +documentation explaining: +- How-to page on how refresh the certificates for a cluster. +- Explanation page on how the refreshing orchestration process works. + +## Testing + + +- Unit tests for the controllers in the bootstrap and control plane providers. +- Integration tests which cover the refreshing process for the certificates in + the cluster using the `v1beta2.k8sd.io/refresh-certificates` annotation in + the `Cluster`, `CK8sControlPlane` and `MachineDeployment` objects. + +## Considerations for backwards compatibility + + +None + +## Implementation notes and guidelines + +For this implementation, we can take as a reference the implementation of the +`CertificateController` in our repository. This controller is responsible for +refreshing the certificates for the Machines in the cluster. We are going to +leverage the logic in there to offload the certificate refresh process to this +controller. + + + +[Proposal 003]: 003-refresh-certs.md +