Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug? http: TLS handshake error from ... local error: tls: bad record MAC #6085

Closed
benatsb opened this issue Jun 3, 2022 · 17 comments
Closed
Assignees
Labels
bug Something isn't working as documented ~csa Issue was created by or deemed important by the Customer Solutions Architect. #g-endpoint-ops Endpoint ops product group :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. ~released bug This bug was found in a stable release.
Milestone

Comments

@benatsb
Copy link

benatsb commented Jun 3, 2022

Fleet version: 4.15.0

Operating system: Windows 11

Web browser: Edge and Chrome (latest)


πŸ§‘β€πŸ’» Β Expected behavior

Build a windows agent, deploy the windows agent, connect to the Fleet server with no errors.

πŸ’₯ Β Actual behavior

Windows 11 device will connect to the Fleet server, but only after I build the agent using the "insecure" flag. The server logs show the following:

level=info ts=2022-06-03T18:56:16.880740187Z component=http path=/api/latest/fleet/device/e7c7dac7-df3e-41dc-91ff-83d6317d2b40 internal="authentication error: invalid device authentication token" err=": Authentication required"
2022/06/03 18:57:11 http: TLS handshake error from local_ip:46872: local error: tls: bad record MAC
2022/06/03 18:59:12 http: TLS handshake error from local_ip:46878: local error: tls: bad record MAC
2022/06/03 18:59:12 http: TLS handshake error from local_ip:46882: local error: tls: bad record MAC
2022/06/03 18:59:12 http: TLS handshake error from local_ip:46884: local error: tls: bad record MAC
2022/06/03 18:59:13 http: TLS handshake error from local_ip:46886: local error: tls: bad record MAC
2022/06/03 18:59:13 http: TLS handshake error from local_ip:46890: local error: tls: bad record MAC
2022/06/03 18:59:13 http: TLS handshake error from local_ip:46892: local error: tls: bad record MAC
2022/06/03 18:59:21 http: TLS handshake error from local_ip:46894: local error: tls: bad record MAC
2022/06/03 18:59:21 http: TLS handshake error from local_ip:46896: local error: tls: bad record MAC
2022/06/03 18:59:22 http: TLS handshake error from local_ip:46898: local error: tls: bad record MAC
2022/06/03 18:59:22 http: TLS handshake error from local_ip:46900: local error: tls: bad record MAC
2022/06/03 18:59:26 http: TLS handshake error from local_ip:46902: local error: tls: bad record MAC
2022/06/03 18:59:26 http: TLS handshake error from local_ip:46904: local error: tls: bad record MAC
2022/06/03 18:59:36 http: TLS handshake error from local_ip:46906: local error: tls: bad record MAC
2022/06/03 18:59:36 http: TLS handshake error from local_ip:46910: local error: tls: bad record MAC
2022/06/03 19:01:50 http: TLS handshake error from local_ip:46930: remote error: tls: bad certificate

More info

Fleet server is on a fresh Ubuntu server 22.04 machine. I used certbot and the "certonly" module there to generate a LetsEncrypt certificate for the server. Copied the certificates over to the fleet installation directory at /etc/fleetdm/

Set permissions for the .key to 600.

Running server for testing with /etc/fleetdm/fleet serve --config /etc/fleetdm/fleet.yml

fleet.yml

mysql:
  address: 127.0.0.1:3306
  database: fleet
  username: fleetadmin
  password: 'password'
redis:
  address: 127.0.0.1:6379
server:
  address: 0.0.0.0:443
  tls_compatibility: modern
  cert: /etc/fleetdm/server.cert
  key: /etc/fleetdm/server.key
  keepalive: true
logging:
  json: true
vulnerabilities:
  current_instance_checks: yes
  databases_path: /etc/fleetdm/vulns
  periodicity: 1h
  #https://nvd.nist.gov/vuln/data-feeds
  #cve_database_url:
logging:
    error_retention_period: 168h
osquery:
    detail_update_interval: 30m
    status_log_plugin: filesystem
#filesystem:
#    status_log_file: /var/log/osquery/status.log
#    result_log_file: /var/log/osquery/result.log
#    enable_log_rotation: true

Built the installer for Windows using the 4.15.0 fleetctl on the same Windows Machine with no osquery or orbit installed. Docker is installed though.

.\fleetctl.exe package --type=msi --fleet-desktop --fleet-url=https://fleettest --enroll-secret=SECRET --insecure

I tried without the "--insecure" flag but that never connected. After a reboot and installing the package with the flag it connects, but error for TLS still occurs server side.


QA notes

To QA this you will need the certificates being added here: #20390

Using fullchain in Fleet server and root CA only client side (should succeed)

  1. Run fleet serve with --server_cert ./tools/test-certs/server/fullchain.cert.pem --server_key ./tools/test-certs/server/server.key.pem and install fleetd.
  2. Generate fleetctl package with --fleet-certificate ./tools/test-certs/root-ca/root-ca.cert.pem
  3. Test fleetctl debug connection --fleet-certificate ./tools/test-certs/root-ca/root-ca.cert.pem https://localhost:8080

Using fullchain in Fleet server and root+intermediate bundle client side (should succeed)

  1. Run fleet serve with --server_cert ./tools/test-certs/server/fullchain.cert.pem --server_key ./tools/test-certs/server/server.key.pem
  2. Generate fleetctl package with --fleet-certificate ./tools/test-certs/intermediate-ca/intermediate-and-root.cert.pem and install fleetd.
  3. Test fleetctl debug connection --fleet-certificate ./tools/test-certs/intermediate-ca/intermediate-and-root.cert.pem https://localhost:8080

Using leaf cert in Fleet server and root+intermediate bundle client side (should succeed)

  1. Run fleet serve with --server_cert ./tools/test-certs/server/leaf.cert.pem --server_key ./tools/test-certs/server/server.key.pem
  2. Generate fleetctl package with --fleet-certificate ./tools/test-certs/intermediate-ca/intermediate-and-root.cert.pem and install fleetd.
  3. Test fleetctl debug connection --fleet-certificate ./tools/test-certs/intermediate-ca/intermediate-and-root.cert.pem https://localhost:8080

Using leaf cert + intermediate bundle in Fleet server and root CA only client side (should succeed)

  1. Run fleet serve with --server_cert ./tools/test-certs/server/leaf-and-intermediate.cert.pem --server_key ./tools/test-certs/server/server.key.pem
  2. Generate fleetctl package with --fleet-certificate ./tools/test-certs/root-ca/root-ca.cert.pem and install fleetd.
  3. Test fleetctl debug connection --fleet-certificate ./tools/test-certs/root-ca/root-ca.cert.pem https://localhost:8080

Using leaf cert in Fleet server and root CA only client side (should fail)

  1. Run fleet serve with --server_cert ./tools/test-certs/server/leaf.cert.pem --server_key ./tools/test-certs/server/server.key.pem
  2. Generate fleetctl package with --fleet-certificate ./tools/test-certs/root-ca/root-ca.cert.pem and install fleetd.
  3. Test fleetctl debug connection --fleet-certificate ./tools/test-certs/root-ca/root-ca.cert.pem https://localhost:8080
@benatsb benatsb added :reproduce Involves documenting reproduction steps in the issue bug Something isn't working as documented labels Jun 3, 2022
@noahtalerman
Copy link
Member

Hey @benatsb sorry you're experiencing this issue.

I'm brining this issue to the Fleet team. This way, the team can provide follow up questions and potential next steps to resolve the issue.

@noahtalerman
Copy link
Member

noahtalerman commented Jun 7, 2022

Hey @benatsb the following "Why aren't my osquery agents connecting to Fleet?" section of the docs includes a "Common problems" section: https://fleetdm.com/docs/deploying/faq#why-arent-my-osquery-agents-connecting-to-fleet

bad record MAC: When generating your certificate for your Fleet server, ensure you set the hostname to the FQDN or the IP of the server. This error is common when setting up Fleet servers and accepting defaults when generating certificates using openssl.

I pulled the above from the docs because it looks like you're seeing bad record MAC entries in your logs.

Please let me know if these instructions don't help in successfully resolving your issue.

@xpkoala
Copy link
Contributor

xpkoala commented Aug 12, 2022

@benatsb I'm going to close this issue for now. If you are still encountering issues please feel free to re-open this ticket with any new information about the problem. Thank you!

@xpkoala xpkoala closed this as completed Aug 12, 2022
@xastherion
Copy link

xastherion commented Oct 25, 2023

hi, i am confronting the same problems in this thread

SERVER
centos stream 9
fleet version 4.38.1

CLIENTS
macOS 13 Ventura + 12 Monterey

Certificate von LetΒ΄sEncrypt renewed with Dehydrated

Browsers: Firefox 115 ESR + Chrome 117

my client repeated this logs:

W1025 15:16:03.459451 1334582912 tls_enroll.cpp:101] Failed enrollment request to https://my-fleet-server.com:8080/api/v1/osquery/enroll (Request error: certificate verify failed) retrying...

and my Server this:

Oct 25 15:18:38 my-fleet-server fleet[1062]: 2023/10/25 15:18:38 http: TLS handshake error from 129.13.171.194:50805: local error: tls: bad record MAC

Out of all Logs, my fleet client run and is showed in fleet server site, but only the hostname and serialnumber, no more. For this short time the client shine online, after go Offline an no more sucedeed.

grafik

Last fetched almost 54 years ago (that is a lot of time!)

If i turn the client "add host" command with --insecure, all run right. But the logs in server are still present.

@N0rthg4t3
Copy link

N0rthg4t3 commented May 3, 2024

I have encountered this issue with Windows clients while setting up a testing environment based on Ubuntu 22.04 LTS and fleetdm version 4.49.2 and following (rather translating) the installation guide for CentOS. One aspect that made my deployment special was the fact that I utilized a TLS certificate issued by a particular internal certification authority belonging to a public key infrastructure dedicated to testing purposes. While I maintained proper full chain certificates and keys on the server side, I experienced these issues in the server log referring to client side TLS validation errors right after client installation and indefinitely ongoing, all whilst the clients had been registered but were displayed as "offline". Thus I took a deeper look at the installed Orbit client and found out that in the client files' root directory there is an accumulation of Base64 coded root CA certificates, called "certs.pem" and comment-titled "Bundle of CA Root Certificates" from Mozilla.

This said, I made the experiment inserting my own CA certificate into this file, restarted the Orbit client and suddenly the error was no longer present in the logs and the client was being displayed as "online" in the web UI. Data could be fetched, so far no functional restrictions in terms of the free version. This said - I think that the Orbit client does not fetch any custom CA that might be installed system-wide in any valid way - thus far, I can only speculate that on Windows devices, the CA certificate being installed in the Windows machine wide cryptstore.

One could speculate that this might also happen while utilizing self signed certificates.

@noahtalerman I have some followup questions:

  1. Is this expected behaviour? Is there any workaround or fix?
  2. Is there some configuration option on the client side that would be more appropriate aside of certs.pem?
  3. If not, can clients, that are built by fleetctl, be configured to automatically involve custom CA's in their configuration? I have thus far not found a parameter for this.

@nonpunctual nonpunctual added the ~csa Issue was created by or deemed important by the Customer Solutions Architect. label May 3, 2024
@N0rthg4t3
Copy link

Was able to reproduce it - this time with a TLS certificate that should be publicy trusted through validatable intermediate CA's, however, until the root CA and all intermediate CA's were added to the certificate on the server (effectively full-chaining it) OR the client's cert.pem file, the error persisted.

@noahtalerman noahtalerman reopened this May 7, 2024
@noahtalerman
Copy link
Member

Thanks @N0rthg4t3!

Heads up @xpkoala, re-opening this issue now that we have a lead on repro.

@xpkoala
Copy link
Contributor

xpkoala commented May 7, 2024

Thanks! It's on my radar.

@sharon-fdm sharon-fdm added #g-endpoint-ops Endpoint ops product group :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. labels May 8, 2024
@lukeheath lukeheath added this to the 4.51.0-tentative milestone May 13, 2024
@sharon-fdm
Copy link
Collaborator

Estimating at 5 as it may be hard to reproduce.

@DasFaultier
Copy link

@N0rthg4t3 Can you give details on full-chaining the server certificate? Does order matter? I'm currently using a certificate file that contains the actual server certificate, then the intermediates below it and the root cert at the very bottom. However, I'm still seeing the errors. Do I maybe need to call fleet prepare with a specific argument in order for Fleet to accept it?

@N0rthg4t3
Copy link

@DasFaultier When full chaining certificates order does matter, specifically where to put the server certificate and where the intermediate and root CA certificates. Depending on the system, I have been fine by adhering to the order Server Certificate > Intermediate 1 > Intermediate 2 > [..,] > Root CA certificate. And at least that is what I understand of
RFC5280, detailing the profile of X.509, section 3.2 (https://datatracker.ietf.org/doc/html/rfc5280#section-3.2).

@sharon-fdm
Copy link
Collaborator

@xpkoala I unassigned you so we do not miss this when we have capacity.
Still need to reproduce.

@lukeheath lukeheath removed the bug Something isn't working as documented label May 24, 2024
@lukeheath lukeheath modified the milestones: 4.51.0, 4.52.0-tentative Jun 7, 2024
@lukeheath lukeheath modified the milestones: 4.53.0, 4.54.0-tentative Jun 24, 2024
@xpkoala xpkoala added the bug Something isn't working as documented label Jun 24, 2024
@lukeheath lukeheath added bug Something isn't working as documented and removed bug Something isn't working as documented labels Jun 28, 2024
@lucasmrod lucasmrod self-assigned this Jul 1, 2024
@lucasmrod
Copy link
Member

lucasmrod commented Jul 1, 2024

Hi folks!

Was able to reproduce it - this time with a TLS certificate that should be publicy trusted through validatable intermediate CA's, however, until the root CA and all intermediate CA's were added to the certificate on the server (effectively full-chaining it) OR the client's cert.pem file, the error persisted.

I performed the following tests with fake certificates and can confirm the above.

Tests

Dummy test certificates:

  • CA root (ca.cert.pem)
  • intermediate
  • leaf server certificate
  • root+intermediate bundle (ca-chain.cert.pem)

They were generated using the following guide.

Using fullchain in Fleet server and root CA only client side

  • curl connect to Fleet with --cacert set to the ca.cert.pem βœ…
  • built fleetd using fleetctl package --fleet-certificate=ca.cert.pem βœ…

Using fullchain in Fleet server and root+intermediate bundle client side

  • curl connect to Fleet with --cacert set to the ca-chain.cert.pem βœ…
  • built fleetd using fleetctl package --fleet-certificate=ca-chain.cert.pem βœ…

Using leaf cert in Fleet server and root+intermediate bundle client side

  • curl connect to Fleet with --cacert set to the ca-chain.cert.pem βœ…
  • built fleetd using fleetctl package --fleet-certificate=ca-chain.cert.pem βœ…

Using leaf cert + intermediate bundle in Fleet server and root CA only client side

  • curl connect to Fleet with --cacert set to the ca.cert.pem βœ…
  • built fleetd using fleetctl package --fleet-certificate=ca.cert.pem βœ…

Using leaf cert in Fleet server and root CA only client side

  • curl connect to Fleet with --cacert set to the ca.cert.pem ❌
  • built fleetd using fleetctl package --fleet-certificate=ca.cert.pem ❌
    The errors were of the following form server side:
2024/07/05 15:03:52 http: TLS handshake error from 127.0.0.1:55182: remote error: tls: bad certificate
2024/07/05 15:03:53 http: TLS handshake error from 127.0.0.1:55183: local error: tls: bad record MAC

and client side:

586 2024-07-05T15:04:52-03:00 DBG get config error="POST /api/fleet/orbit/config: Post \"https://fleet.example.com/api/fleet/orbit/config\": tls: failed to verify certificate: x509: certificate signed by unknown authority"
[...]
W0705 15:16:44.739495 1251102656 init.cpp:760] Error reading config: Request error: certificate verify failed

This is expected to fail because fleetd/osquery doesn't know of the intermediate certificate so it requires the server to send it.

Next steps

  1. Document that root CA + intermediates must be present in the bundled certificate in fleetd. A default bundle is embedded in fleetctl (when built) and may not contain intermediate certificates present in your server certificate.
  2. Discuss with product team if we can do a TLS connection check to the provided --fleet-url using the certificate (default or provided) during the fleetctl package execution. This will help everyone catch issues during package generation instead of during deploy. We have an existing command fleetctl debug connection to do connection checks to a Fleet URL: , but users may now be aware of it (e.g. fleetctl debug connection --fleet-certificate /opt/orbit/certs.pem https://fleet.example.com). /cc @noahtalerman @rachaelshaw.

For (2) I've created #20142.

@lucasmrod lucasmrod removed the :reproduce Involves documenting reproduction steps in the issue label Jul 1, 2024
@lucasmrod
Copy link
Member

I forgot to thank @N0rthg4t3 for your feedback here! (it helped me reproduce the issue)

@lukeheath lukeheath added ~released bug This bug was found in a stable release. bug Something isn't working as documented and removed :more info please bug Something isn't working as documented labels Jul 7, 2024
lucasmrod added a commit that referenced this issue Jul 9, 2024
#6085

- [X] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
See [Changes
files](https://fleetdm.com/docs/contributing/committing-changes#changes-files)
for more information.
- [X] Added/updated tests
- [x] Manual QA for all new/changed functionality
@lucasmrod
Copy link
Member

@xpkoala @PezHub I've added QA notes to the description.

@xpkoala
Copy link
Contributor

xpkoala commented Jul 15, 2024

The above scenarios were run with the certs provided and I received the expected success / fail states outlined in the steps.

lucasmrod added a commit that referenced this issue Jul 16, 2024
@fleet-release
Copy link
Contributor

In a secure cloud city,
TLS handshake finds harmony,
Fleet's code, more trustworthy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as documented ~csa Issue was created by or deemed important by the Customer Solutions Architect. #g-endpoint-ops Endpoint ops product group :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. ~released bug This bug was found in a stable release.
Projects
None yet
Development

No branches or pull requests