Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Fleet Server to be run without TLS. #6020

Merged
merged 7 commits into from
Oct 4, 2022

Conversation

naemono
Copy link
Contributor

@naemono naemono commented Sep 20, 2022

closes #6000

This potential change allows Fleet Server, running as Elastic Agent, to be run without TLS for such scenarios as running within a service mesh which could provide mTLS between applications/services, similar to how we allow Elasticsearch, and Kibana to run without TLS.

This additionally solves a panic when currently attempting to disable TLS on Fleet server:

[elastic-operator-0] {"log.level":"info","@timestamp":"2022-09-14T20:10:55.031Z","log.logger":"manager.eck-operator","message":"Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference","service.version":"2.4.0+96282ca9","service.type":"eck","ecs.version":"1.4.0","controller":"agent-controller","object":{"name":"eck-fleet-eck-fleet-server","namespace":"istio-enabled"},"namespace":"istio-enabled","name":"eck-fleet-eck-fleet-server","reconcileID":"c0330ea8-bda6-48d5-8552-d25327c3959e"}
[elastic-operator-0] panic: runtime error: invalid memory address or nil pointer dereference [recovered]
[elastic-operator-0] 	panic: runtime error: invalid memory address or nil pointer dereference
[elastic-operator-0] [signal SIGSEGV: segmentation violation code=0x1 addr=0x120 pc=0x1769409]
[elastic-operator-0]
[elastic-operator-0] goroutine 2739 [running]:
[elastic-operator-0] sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
[elastic-operator-0] 	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:118 +0x1f4
[elastic-operator-0] panic({0x1a81660, 0x2de94a0})
[elastic-operator-0] 	/usr/local/go/src/runtime/panic.go:838 +0x207
[elastic-operator-0] github.com/elastic/cloud-on-k8s/v2/pkg/controller/agent.internalReconcile({{0x2017448, 0xc00180b110}, {0x201e198, 0xc00020c820}, {0x2016768, 0xc0007c2b80}, {0xc0007bff20, 0xc0007bff80, 0xc0007da000, 0xc0007da060}, ...})
[elastic-operator-0] 	/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/agent/driver.go:130 +0xe89
[elastic-operator-0] github.com/elastic/cloud-on-k8s/v2/pkg/controller/agent.(*ReconcileAgent).doReconcile(_, {_, _}, {{{0x18b3b85, 0x5}, {0xc0017f21a0, 0x1d}}, {{0xc0006eb800, 0x1a}, {0x0, ...}, ...}, ...})
[elastic-operator-0] 	/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/agent/controller.go:181 +0x585
[elastic-operator-0] github.com/elastic/cloud-on-k8s/v2/pkg/controller/agent.(*ReconcileAgent).Reconcile(0xc000772f20, {0x2017448?, 0xc00180b020?}, {{{0xc00076ebb0?, 0x10?}, {0xc000213500?, 0x40d927?}}})
[elastic-operator-0] 	/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/agent/controller.go:147 +0x478
[elastic-operator-0] sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x20173a0?, {0x2017448?, 0xc00180b020?}, {{{0xc00076ebb0?, 0x1c017c0?}, {0xc000213500?, 0x4041f4?}}})
[elastic-operator-0] 	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:121 +0xc8
[elastic-operator-0] sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0004a2d20, {0x20173a0, 0xc000634a80}, {0x1b01ee0?, 0xc0004d1be0?})
[elastic-operator-0] 	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:320 +0x33c
[elastic-operator-0] sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0004a2d20, {0x20173a0, 0xc000634a80})
[elastic-operator-0] 	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273 +0x1d9
[elastic-operator-0] sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
[elastic-operator-0] 	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234 +0x85
[elastic-operator-0] created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
[elastic-operator-0] 	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:230 +0x325
^C[1]    43051 terminated  kubetail -n elastic-system -l app.kubernetes.io/instance=elastic-operator  1m

Also, an e2e test was added to ensure the full functionality all the way back to data within Elasticsearch.

E2e test verifying Fleet server + Agent w/o TLS.
Unit test for Fleet Server pod template without TLS.
@naemono naemono added >bug Something isn't working >enhancement Enhancement of existing functionality labels Sep 20, 2022
@naemono
Copy link
Contributor Author

naemono commented Sep 20, 2022

run/e2e-tests

@naemono
Copy link
Contributor Author

naemono commented Sep 22, 2022

note. Need to test this against an older version of the stack to ensure that this works against older version, or restrict this to newer versions of the stack if not functional against older versions.

@CheyenneForbes
Copy link

+1 for this functionality

Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, just a few nits. But I have questions about the practicality of running Elastic Agent through a service mesh like Istio. Many of our recipes use hostNetwork: true for the Agents but that is a configuration that is not supported by Istio sidecar injection. Maybe I am missing a clever workaround for this issue?

@@ -507,6 +520,16 @@ func getFleetSetupFleetEnvVars(_ context.Context, agent agentv1alpha1.Agent, cli
return nil, err
}
fleetCfg[FleetURL] = assocConf.GetURL()

u, err := url.Parse(fleetCfg[FleetURL])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parsing seems redundant. You could either look at agent.Spec.HTTP.Protocol() once more which is used to construct the Fleet URL IIUC. Or look at agent.Spec.HTTP.TLS.Enabled() Or am I missing something important?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very true. I'd forgotten about his method on the spec. Fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out that it's the associated agent that we need to inspect (the fleet agent). I updated this to just strings.Contains(url, "https://") to test if the associated fleet is running with/without tls. Lmk your thoughts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right I missed this. I think the downside of deriving this from the URL is that we effectively disable the safeguard the Fleet team has built in here. They could have just done the same thing and look at the URL to decide whether they allow Agent to talk to an insecure Fleet server. But they opted to add another configuration setting to make this an explicit decision.

))
} else {
// Force FLEET_SERVER_HOST environment variable to Pod IP, as without this, Fleet Server only binds to localhost.
builder = builder.WithEnv(corev1.EnvVar{Name: "FLEET_SERVER_HOST", Value: "", ValueFrom: &corev1.EnvVarSource{
Copy link
Collaborator

@pebrc pebrc Sep 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Should FLEET_SERVER_HOST be a constant?
  2. Also the need for this only in HTTP mode confuses me. In my experiments I did not seem to need this even with Istio enabled.
  3. Should this be inside the existing applyEnvVars function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes it should. Fixed.

  2. Absolutely. I've moved that into the function.

  3. Longer answer. Let me test this one additional time. IIRC, everything worked up until I started trying to use an actual agent attached to fleet. Let me re-verify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After testing 3: It turns out that environment variable isn't needed after all. I've removed it, and updated our agent tests to use actual URLs in the tests instead of "url" or similar.

Don't use url.Parse, just use agent.Spec.HTTP.Protocol().
Move envvar addition to applyEnvVars function.
Fix wording.
Add insecure flag when agent is running with associated fleet running without tls
pkg/controller/agent/pod.go Outdated Show resolved Hide resolved
@@ -507,6 +520,16 @@ func getFleetSetupFleetEnvVars(_ context.Context, agent agentv1alpha1.Agent, cli
return nil, err
}
fleetCfg[FleetURL] = assocConf.GetURL()

u, err := url.Parse(fleetCfg[FleetURL])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right I missed this. I think the downside of deriving this from the URL is that we effectively disable the safeguard the Fleet team has built in here. They could have just done the same thing and look at the URL to decide whether they allow Agent to talk to an insecure Fleet server. But they opted to add another configuration setting to make this an explicit decision.

Co-authored-by: Peter Brachwitz <peter.brachwitz@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug Something isn't working >enhancement Enhancement of existing functionality v2.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ECK Support For Elastic Fleet With HTTP TLS Disabled Mode For ISTIO Service Mesh
3 participants