Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes to fleetctl debug connection and TLS certs documentation #20166

Merged
merged 6 commits into from
Jul 9, 2024

Conversation

lucasmrod
Copy link
Member

@lucasmrod lucasmrod commented Jul 2, 2024

#6085

  • Changes file added for user-visible changes in changes/, orbit/changes/ or ee/fleetd-chrome/changes.
    See Changes files for more information.
  • Added/updated tests
  • Manual QA for all new/changed functionality

mostlikelee
mostlikelee previously approved these changes Jul 8, 2024
Copy link
Contributor

@mostlikelee mostlikelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Docs! One non-blocking question.

@@ -57,9 +57,15 @@ func ValidateConnectionContext(ctx context.Context, pool *x509.CertPool, targetU
}

cert := state.PeerCertificates[0]
intermediates := x509.NewCertPool()
for _, intermediate := range state.PeerCertificates[1:] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we run into an out of range err here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking!

It seems that at this point we have len(state.PeerCertificates) > 0, so with a slice of 1 this is a-ok (won't loop)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right, that will produce an empty slice if len()==1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm deleting this file because it contains outdated docs.

@sharon-fdm
Copy link
Collaborator

@lucasmrod, Can you see any scenario where we removed (from the server) any ability that could prevent old agents from communicating?

sharon-fdm
sharon-fdm previously approved these changes Jul 8, 2024
Copy link
Collaborator

@sharon-fdm sharon-fdm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes LGTM.
Before merging please answer this question:
@lucasmrod, Can you see any scenario where we removed (from the server) any ability that could prevent old agents from communicating?

@lucasmrod
Copy link
Member Author

Can you see any scenario where we removed (from the server) any ability that could prevent old agents from communicating?

Do you mean changes in the past or in the changes on this PR?

What could happen with very old agent is that their pinned certs (e.g. /opt/orbit/certs.pem) may need updating if users start to see communication issues with the Fleet server (TLS certs verification errors in orbit/osquery logs).

@sharon-fdm
Copy link
Collaborator

@lucasmrod I mean changes in this PR that could prevent old agents from communicating

@sharon-fdm
Copy link
Collaborator

If so, we need to communicate this very clear to customer support!

@sharon-fdm
Copy link
Collaborator

sharon-fdm commented Jul 8, 2024

What could happen with very old agent is that their pinned certs (e.g. /opt/orbit/certs.pem) may need updating if users start to see communication issues with the Fleet server (TLS certs verification errors in orbit/osquery logs).

This has to be understood by customer support. And maybe allow feature-flagging it.

@lucasmrod

@lucasmrod
Copy link
Member Author

I mean changes in this PR that could prevent old agents from communicating

Ah no server changes on this PR.

Changes in this PR only impact client side:

  • fleetctl debug connection command (used for troubleshooting).
  • Some verification code changed used by orbit at start up time (and if such verification code fails we just log an error), so it should be safe to merge.

@sharon-fdm
Copy link
Collaborator

If it only affects new agents then NP.

@lucasmrod
Copy link
Member Author

This has to be understood by customer support. And maybe allow feature-flagging it.

Maybe fleetd users in the far future will find issues with outdated certs. But that's something unrelated to changes in this PR.

@sharon-fdm
Copy link
Collaborator

Understood.
LGTM then. Feel free to merge.

Copy link
Contributor

@spokanemac spokanemac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs and article LGTM.

@sharon-fdm sharon-fdm added the :ai Request AI PR review label Jul 9, 2024
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

  • Added new documentation file articles/certificates-in-fleetd.md for configuring TLS CA root certificates.
  • Enhanced cmd/fleetctl/debug.go to handle various TLS certificate scenarios and improve output messages.
  • Updated cmd/fleetctl/debug_test.go with comments for test case context.
  • Introduced a new section in docs/Using Fleet/enroll-hosts.md on debugging TLS certificates.
  • Modified pkg/certificate/certificate.go to support intermediate certificates in TLS verification.

9 file(s) reviewed, 21 comment(s)
Edit PR Review Bot Settings

@@ -0,0 +1,62 @@
# Certificates in fleetd

There are three components in fleetd connecting to the Fleet server using TLS: `orbit`, `Fleet Desktop` and `osqueryd`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ info: Clarified the components in fleetd that connect to the Fleet server using TLS.

Comment on lines +10 to +11
- By default, `orbit` and `Fleet Desktop` will use the system's CA root store to connect to Fleet.
- `osqueryd` doesn't support using the system's CA root store, it requires passing in a certificate file with the root CA store (via the `--tls_server_certs` flag). The `fleetctl` executable contains an embedded `certs.pem` file generated from https://curl.se/docs/caextract.html [0]. When generating a fleetd package with `fleetctl package` such embedded `certs.pem` file is added to the package [1]. Fleetd configures `osqueryd` to use the `certs.pem` file as CA root store by setting the `--tls_server_certs` argument to such path.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ info: Described the default behavior for 'orbit', 'Fleet Desktop', and 'osqueryd' regarding CA root store usage.


## Using `--fleet-certificate` in `fleetctl package`

When using `--fleet-certificate` in `fleetctl package`, such certificate file is used as a CA root store by `orbit`, `Fleet Desktop` and `osqueryd` (the system's CA store is not used when generating the fleetd package this way).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ info: Explained the effect of using the '--fleet-certificate' flag in 'fleetctl package'.

Comment on lines +19 to +26
TLS clients require the CA root and all intermediate certificates that signed the leaf server certificate to be verified.
This means that if the bundled certificate in fleetd [1] doesn't have intermediate certificates that signed the leaf certificate, then the Fleet server will have to be configured to serve the "fullchain".
Here's a list of some scenarios assuming your Fleet server certificate has an intermediate signing certificate:
- ✅ Using fullchain in the Fleet server and root CA only client side.
- ✅ Using fullchain in the Fleet server and root+intermediate bundle client side.
- ✅ Using the leaf certificate in the Fleet server and root+intermediate bundle client side.
- ✅ Using the leaf certificate + intermediate bundle in the Fleet server and root CA only client side.
- ❌ Using the leaf certificate in the Fleet server and root CA only client side. In this scenario the client side (fleetd) doesn't know of the intermediate certificate and thus cannot verify it.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ info: Outlined scenarios for verifying certificates with intermediate signing certificates.

Comment on lines +32 to +43
When there are certificate issues you will see the following kind of errors in server logs:
```
2024/07/05 15:03:52 http: TLS handshake error from <remote_ip>:<remote_port>: remote error: tls: bad certificate
2024/07/05 15:03:53 http: TLS handshake error from <remote_ip>:<remote_port>: local error: tls: bad record MAC
```
and the following kind of errors on the client side (fleetd):
```
2024-07-05T15:04:52-03:00 DBG get config error="POST /api/fleet/orbit/config: Post \"https://fleet.example.com/api/fleet/orbit/config\": tls: failed to verify certificate: x509: certificate signed by unknown authority"
```
```
W0705 15:16:44.739495 1251102656 init.cpp:760] Error reading config: Request error: certificate verify failed
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ info: Provided example error messages for diagnosing certificate issues.

//
//go:embed certs.pem
var osqueryCerts []byte
var OsqueryCerts []byte
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ info: Changed variable name from osqueryCerts to OsqueryCerts to follow Go naming conventions for exported variables.


func writeOsqueryCertPEM(opt Options, orbitRoot string) error {
path := filepath.Join(orbitRoot, "certs.pem")
if err := secure.MkdirAll(filepath.Dir(path), constant.DefaultDirMode); err != nil {
return fmt.Errorf("mkdir: %w", err)
}

if err := os.WriteFile(path, osqueryCerts, 0o644); err != nil {
if err := os.WriteFile(path, OsqueryCerts, 0o644); err != nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ info: Updated to use OsqueryCerts instead of osqueryCerts when writing the file.

Comment on lines +60 to +63
intermediates := x509.NewCertPool()
for _, intermediate := range state.PeerCertificates[1:] {
intermediates.AddCert(intermediate)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ info: Added support for intermediate certificates by creating a new cert pool and adding all intermediate certificates to it.

Comment on lines +66 to +68
DNSName: parsed.Hostname(),
Roots: pool,
Intermediates: intermediates,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ info: Included the intermediates cert pool in the verification options to ensure a complete chain of trust.

@@ -172,7 +172,6 @@ func newBaseClient(
// Ignoring "G402: TLS InsecureSkipVerify set true", needed for development/testing.
tlsConfig.InsecureSkipVerify = true //nolint:gosec
default:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ info: Removed comment about system certs not working on Windows.

@@ -238,7 +238,7 @@ jobs:

- name: Uninstall Orbit
run: |
./orbit/tools/cleanup/cleanup_macos.sh
sudo ./orbit/tools/cleanup/cleanup_macos.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted this in slack, over-communicating here that there's an alternative fix https://github.com/fleetdm/fleet/pull/20226/files

I wouldn't change it so you don't lose all the reviews

@lucasmrod lucasmrod merged commit 2875a9d into main Jul 9, 2024
30 checks passed
@lucasmrod lucasmrod deleted the 6085-add-docs-and-fix-fleetctl-debug-connection branch July 9, 2024 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ai Request AI PR review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants