Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADR] Server token rotation #8215

Merged
merged 3 commits into from
Sep 25, 2023
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 94 additions & 0 deletions docs/adrs/server-token-rotation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Support Rotating Server Tokens

Date: 2023-08-13

## Status

Accepted

## Context


### Current tokens

See the existing [Support `kubeadm`-style Bootstrap Token Secrets](agent-join-token.md) ADR
for more background on current token support.

Important to this discussion is the fact that the `--token` value is used as the passphrase input
used to generate the encryption key for cluster bootstrap data. For this reason, all server nodes
must use the same token value once the cluster has been started, and the token value cannot be changed.

### Security Considerations

This is a paraphase of @macedogm words concerning the security implications of the current token implementation:

1. Proactiveness: users want to rotate the token periodically as security best practices recommend,
which minimizes the risk of malicious access to the cluster in case the token is leaked. This is
effective when the rotation happens before a malicious user has the chance to actually use the token.

2. Reactiveness: users want to rotate the token after it's detected that the token leaked. Here we
should assume the worst case scenario. Similar when the root credentials of a server is leaked and
the server is compromised, we can only be sure of its security and trustworthiness after performing
a clean reinstall (given all the existing stealth rootkits and backdoors that can cause the server to
be reinfected multiple times). It is assumed that paranoid users and governmental agencies would perform
this action. In this event, token rotation is not enough, only with the clean reinstall one can be 100%
sure of the cluster's security state.

### Existing Work

In past K3s versions, we did not require cluster to be started with a token. When we mandated support
for tokens, we migrated empty string tokens to a randomly generated token. This migration can be
reused to support the rotation from an old token to a new token.


### New Token Rotation

The new token rotation feature will allow the user to rotate the token value used to encrypt cluster bootstrap data.

A new subcommand `k3s token rotate`, will be added to the `k3s` binary. This subcommand can either:
1: take a new supplied token value or 2: Generate a 16 character token and then replace the existing token.
The /var/lib/rancher/k3s/server/token file and passwd file will be updated with the new token value.
Admins can then use the new token value to rejoin existing server nodes or join new server nodes to the cluster.

Similar to the `k3s certificate rotate` and the `k3s secret-encrypt rotate-keys` subcommands,
the `k3s token rotate` subcommand will be wrapper for an API request to the server to perform the decryption
with the old token, and then reencryption of the bootstrap data with the new token. After reenecryption, the
bootstrap data will be updated with the modified token and passwrd files, allowing propgation of the files to
dereknola marked this conversation as resolved.
Show resolved Hide resolved
existing servers upon restart.

### Token Rotation Workflow

HA configuration:

1a) On server 1 run:
```
k3s token rotate -t <OLD_TOKEN>
```

OR
1b) On server 1 run:

```
k3s token rotate -t <OLD_TOKEN> --new-token <NEW_TOKEN>
```

2) If 1a) Retrieve the new random token value from the /var/lib/rancher/k3s/server/token file on server 1
```
vi /var/lib/rancher/k3s/server/token
```

3) Stop and restart the k3s server process on servers 2 and 3 with the new token value:

```
systemctl stop k3s
# edit /etc/rancher/k3s/config.yaml and update the token value
systemctl start k3s
```

## Decision

We will procced forward with the above implementation.
dereknola marked this conversation as resolved.
Show resolved Hide resolved

## Consequences

Documentation the is explicit around the cluster token is compromised. Tt's strongly recommend to do a clean cluster reinstall, since this is the only way to be sure of the cluster's security state - eliminating the possibility that backdoors could have been planted by a malicious user".
dereknola marked this conversation as resolved.
Show resolved Hide resolved