Skip to content

Commit

Permalink
[v16] Machine ID: Documentation for Bot Instances (#45885)
Browse files Browse the repository at this point in the history
* Machine ID: Documentation for Bot Instances

This adds some basic documentation discussing what Bot Instances are
and how to inspect them.

* Fix docs lints

* Update cspell.json

* Fix docs lints; undo editor-induced whitespace changes in json

* Remove old references to "bot instances"
  • Loading branch information
timothyb89 authored Sep 5, 2024
1 parent 6eed164 commit 5f1581c
Show file tree
Hide file tree
Showing 4 changed files with 92 additions and 33 deletions.
1 change: 1 addition & 0 deletions docs/cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -786,6 +786,7 @@
"readall",
"readyz",
"realmd",
"reauthenticates",
"reauthentication",
"recents",
"reco",
Expand Down
21 changes: 21 additions & 0 deletions docs/pages/enroll-resources/machine-id/introduction.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,27 @@ access to and what sort of proof (known as the **join method**) is needed to use
this join token. This proof is typically an identity issued to the machine by
the platform it runs on (e.g. AWS IAM).

Multiple join tokens may be created for a single bot to allow joining with
different join methods.

### Bot Instances

Each time a new `tbot` client joins from scratch, it creates a new server-side
Bot Instance. Bot Instances keep track of individual `tbot` installations over
time, even as they renew their certificates or rejoin. These server-side
resources also record the most recent authentication attempts, as well as
bot heartbeats.

Many Bot Instances can exist concurrently for a given Bot, regardless of their
join method.

Bot Instances can be inspected with:
- `tctl get bot_instance` to list all instances
- `tctl get bot_instance/$botName` to list all instances associated with a
particular Bot
- `tctl get bot_instance/$botName/$id` to show a single bot instance by its bot
name and ID

### tbot

Machine ID is used through an agent called `tbot`. `tbot` authenticates with the
Expand Down
7 changes: 4 additions & 3 deletions docs/pages/enroll-resources/machine-id/troubleshooting.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,9 @@ be automatically [locked](../../admin-guides/access-controls/guides/locking.mdx)

Renewable certificates are exclusively stored in the bot's internal data
directory, by default `/var/lib/teleport/bot`. It's possible to trigger this by
accident if multiple bots are started using the same internal data directory, or if
this internal data is otherwise being shared between multiple bot instances.
accident if multiple bots are started using the same internal data directory, or
if this internal data is otherwise being shared between multiple `tbot`
processes.

Additionally, if a bot fails to save its freshly renewed certificates (for
example, due to a filesystem error) and crashes, it will attempt a renewal
Expand All @@ -55,7 +56,7 @@ Before unlocking the bot, try to determine if either of the two scenarios
described above apply. If the certificates were stolen, there may be
underlying security concerns that need to be addressed.

Otherwise, first ensure only one bot instance is using the internal data
Otherwise, first ensure only one `tbot` process is using the internal data
directory. Multiple bots can be run on a single system, but separate data
directories must be configured for each.

Expand Down
96 changes: 66 additions & 30 deletions docs/pages/reference/architecture/machine-id-architecture.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,15 @@ they comprise three linked resources. These are:
- Bot user: this will be the user that the Machine ID agent authenticates as.
- Bot role: the bot user is assigned the bot role, and the bot role contains
various permissions that the bot will need to function. For example, the
ability to watch the certificate authorities and the ability to
ability to watch the certificate authorities and the ability to
[impersonate roles](#role-impersonation).
- Token: for [onboarding](#joining-and-authentication), a token must exist that
allows the Machine ID agent to initially authenticate as the bot user. If an
existing token is not specified, then a single-use token will be created by
the Auth Server.
- Bot instance: a single instance of a bot. As multiple `tbot` clients can join
with a single Bot user or a single token, Bot Instances keep a running record
of unique bot joins.

The creation of these resources is managed by `tctl bots add`.

Expand All @@ -39,15 +42,15 @@ Role Impersonation is an RBAC feature of Teleport that is used heavily by
Machine ID.

Role Impersonation allows a user to generate credentials with a set of requested
roles. The user does not have to hold these roles, but must have been granted
roles. The user does not have to hold these roles, but must have been granted
permission to impersonate them. The impersonated credentials still include
the username of the user that generated them, so actions can be attributed to
the username of the user that generated them, so actions can be attributed to
the user.

These credentials can then be used to complete any action that is allowed by the
role's configured permissions.
role's configured permissions.

In the case of Machine ID, the bot user is assigned a bot role, which includes
In the case of Machine ID, the bot user is assigned a bot role, which includes
permissions to impersonate the roles that the user has configured.

## `tbot`
Expand All @@ -71,15 +74,15 @@ it is executed. This consists of:
should be applied to those credentials (for example, what roles should be
impersonated).

For more detail about the configuration options, see
For more detail about the configuration options, see
[the reference.](../machine-id/machine-id.mdx)

On initial load, `tbot` uses the configured join method to obtain a set of
credentials for the bot user from the Teleport Auth Service. It can then use
On initial load, `tbot` uses the configured join method to obtain a set of
credentials for the bot user from the Teleport Auth Service. It can then use
these credentials to communicate with the Teleport Auth Service as the bot.

Then on a configured regular period `tbot` begins its renewal process. It begins
by refreshing the bot's own credentials, by renewing them or fetching a fresh
by refreshing the bot's own credentials, by renewing them or fetching a fresh
set of credentials depending on the configured onboarding method.

For each output provided in the `tbot` configuration, the `tbot` program uses
Expand All @@ -101,16 +104,16 @@ Teleport Auth Service.

Machine ID leverages the existing token resource within Teleport, with the
token containing an additional `botName` field that identifies the bot user
associated with the token.
associated with the token.

Machine ID currently supports two methods of joining that have some key
differences.

### Ephemeral token

- The name of the token is used as an opaque secret needed to join the Teleport
- The name of the token is used as an opaque secret needed to join the Teleport
cluster. This means it must be stored and communicated securely.
- Once used, the token resource self-destructs. This means it can only be used
- Once used, the token resource self-destructs. This means it can only be used
to join a single bot to a Teleport cluster.

As these tokens can only be used once, the certificates that are issued when
Expand All @@ -119,19 +122,19 @@ to be used to request new short-lived certificate.

In order to mitigate the risk of bot user credentials being stolen, and then
continually renewed by a malicious actor, renewable bot user certificates
include a **generation counter**.
validate the bot instance's **generation counter**.

The generation counter is stored against the user in the database and within the
certificate. This counter is incremented each time the user renews their
certificate. When a bot attempts to renew, the Auth Server ensures that the
value within the certificate and in the database match. If they do not match,
then the bot user is automatically locked. This means that if certificates are
stolen, and attempted to be renewed whilst the bot is still running, the next
renewal will render them useless.
The generation counter is stored as part of the [Bot Instance](#bot-instances)
in the database and within the certificate. This counter is incremented each
time this bot instance renews its certificate. When a bot attempts to renew, the
Auth Service ensures that the value within the certificate and in the database
match. If they do not match, then the bot user is automatically locked. This
means that if certificates are stolen, and attempted to be renewed whilst the
bot is still running, the next renewal will render them useless.

### Dynamic join tokens (e.g AWS IAM)

- These tokens rely on an external authority that allows the bot to prove it is
- These tokens rely on an external authority that allows the bot to prove it is
allowed to join the cluster. The name of the token identifies the Token
resource in Teleport that contains the configuration.
- The token can be used to join as many bots as you want, and do not self
Expand All @@ -142,14 +145,47 @@ renewal will render them useless.
Where possible, you should prefer to use a dynamic join token over an ephemeral
token as this eliminates the need to handle a secret.

### Bot Instances

A Bot Instance identifies a single lineage of bot identities, even through
certificate renewals and rejoins. When the `tbot` client first authenticates to
a cluster, a Bot Instance is generated and its UUID is embedded in the returned
client identity.

When that bot later renews or reauthenticates, it authenticates to the Teleport
Auth Service using its previous client certificate, and the Bot Instance ID is
extracted from that identity. A record of the authentication event is stored on
the Teleport Auth Service, along with an identity generation counter. The
generation counter is tracked for all join types (ephemeral and dynamic),
but is currently only enforced for `token`-type joins.

Bot Instances also track a variety of other information about `tbot` instances,
including regular heartbeats which include basic information about the `tbot`
host, like its architecture and OS version.

As tracking Bot Instances requires bots to prove their identity during each
authentication attempt, this does require bots to maintain state if they wish
to keep a single Bot Instance ID over time. It isn't expected or feasible to
keep state for many Machine ID use cases: for example, CI/CD workflows generally
should rejoin from scratch each time. This is expected behavior, and bots with
use cases like this will generate more unique Bot Instances than long-lived
clients.

Bot Instances have a relatively short lifespan and are set to expire after the
most recent identity issued for that instance will expire. If the `tbot` client
associated with a particular Bot Instance renews or rejoins, the expiration of
the bot instance is reset. This is designed to allow users to list Bot Instances
for an accurate view of the number of active `tbot` clients interacting with
their Teleport cluster.

## File permissions

There are two types of folder in use by `tbot`:

- The bot's own files: these store credentials belonging to the `tbot` process
itself. As these credentials are potentially renewable, and will allow the
impersonation of any roles you have assigned to the bot user, they should be
treated as exceptionally sensitive. The bot's own files are stored by default at
treated as exceptionally sensitive. The bot's own files are stored by default at
`/var/lib/teleport/bot/`.
- Output destinations: when a directory destination is configured, the bot
outputs the role impersonated credentials as files in the specified directory.
Expand All @@ -162,19 +198,19 @@ specifically for running `tbot` and to ensure that only this user has access
to this directory.

In the case of directory destinations, the process the bot runs as requires read
and write permissions, and processes that will need the credentials output by
the bot require read permissions. We recommend that you create a Linux user
specific to the process that needs to access these files. When using
and write permissions, and processes that will need the credentials output by
the bot require read permissions. We recommend that you create a Linux user
specific to the process that needs to access these files. When using
`tbot init`, specify this Linux user as the "reader" to grant it access to the
destination.

In addition to basic POSIX filesystem permissions, `tbot init` also sets up
Linux ACLs if the system supports it. This allows more granular control by
granting individual users access.

Finally, on systems that support it, `tbot` will by default attempt to prevent
the resolution of symbolic links when reading and writing files. This prevents a
class of attacks sometimes known as
[symlink attacks](https://capec.mitre.org/data/definitions/132.html). This
behaviour can be disabled using the `insecure` symlink option when configuring
Finally, on systems that support it, `tbot` will by default attempt to prevent
the resolution of symbolic links when reading and writing files. This prevents a
class of attacks sometimes known as
[symlink attacks](https://capec.mitre.org/data/definitions/132.html). This
behaviour can be disabled using the `insecure` symlink option when configuring
your destination.

0 comments on commit 5f1581c

Please sign in to comment.