Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hardened Security Context for Elasticsearch #6703

Merged
merged 17 commits into from
Apr 27, 2023

Conversation

barkbay
Copy link
Contributor

@barkbay barkbay commented Apr 19, 2023

If Elasticsearch configuration files are copied before symbolic links are created, it is possible to run all the Elasticsearch containers with the following restricted securityContext

    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      privileged: false
      readOnlyRootFilesystem: true

Fixes #6126

See my comment here if you want to review this PR: #6703 (comment)

@botelastic botelastic bot added the triage label Apr 19, 2023
@barkbay
Copy link
Contributor Author

barkbay commented Apr 19, 2023

buildkite test this -f p=gke,s=8.7.0,s=7.17.8

@thbkrkr
Copy link
Contributor

thbkrkr commented Apr 19, 2023

buildkite test this -f p=gke,s=8.7.0,s=7.17.8

This will only run the e2e tests with s=7.17.8. When you want to test one variable with different values (here s the stack version), you need to declare this variable and its values in the --mixed|-m flag.

  • buildkite test this --fixed p=gke --mixed s=8.7.0,s=7.17.8
  • buildkite test this -f p=gke -m s=8.7.0,s=7.17.8 (with the short flags)

@thbkrkr thbkrkr added the >enhancement Enhancement of existing functionality label Apr 19, 2023
@botelastic botelastic bot removed the triage label Apr 19, 2023
@barkbay
Copy link
Contributor Author

barkbay commented Apr 19, 2023

@thbkrkr Thanks! I think I spotted a problem with this PR, I will use your advice next time 🙇

@barkbay
Copy link
Contributor Author

barkbay commented Apr 20, 2023

  • TestVolumeEmptyDir failed because this "feature" is itself broken: emptyDir volume is not mounted #6186. I added a function to detect, on a best effort basis, whether or not data are in a volume.
  • TestKibanaStandalone also failed, but I'm not sure to understand why 🤷

@thbkrkr
Copy link
Contributor

thbkrkr commented Apr 20, 2023

TestKibanaStandalone looks very similar to #6693 with all sub-tests failing as we don't have the end of the test execution in the log because the log stream was interrupted.

log
{
    "Time": "2023-04-19T17:16:36.671150963Z",
    "Action": "output",
    "Package": "github.com/elastic/cloud-on-k8s/v2/test/e2e/kb",
    "Test": "TestKibanaStandalone/ES_HTTP_certificate_authority_should_be_set_and_deployed",
    "Output": "=== RUN   TestKibanaStandalone/ES_HTTP_certificate_authority_should_be_set_and_deployed\n"
}
{
    "log.level": "info",
    "@timestamp": "2023-04-19T17:30:05.540Z",
    "log.logger": "e2e-pdzcj",
    "message": "Log stream ended",
    "service.version": "0.0.0-SNAPSHOT+00000000",
    "service.type": "eck",
    "ecs.version": "1.4.0",
    "pod_name": "eck-e2e-pdzcj-cl86s"
}
{
    "log.level": "info",
    "@timestamp": "2023-04-19T17:30:05.540Z",
    "log.logger": "e2e-pdzcj",
    "message": "Streaming pod logs",
    "service.version": "0.0.0-SNAPSHOT+00000000",
    "service.type": "eck",
    "ecs.version": "1.4.0",
    "pod_name": "eck-e2e-pdzcj-cl86s"
}

@barkbay
Copy link
Contributor Author

barkbay commented Apr 20, 2023

buildkite test this --fixed p=gke --mixed s=8.7.0,s=7.17.8

@barkbay
Copy link
Contributor Author

barkbay commented Apr 20, 2023

buildkite test this -f p=gke -m s=8.7.0,s=7.17.8

@barkbay
Copy link
Contributor Author

barkbay commented Apr 21, 2023

Failures to investigate from the last e2e tests session:

image

I think this is because I added a check to validate that all the containers in the Elasticsearch Pod are running with the expected securityContext. That's probably not the case for Beat sidecars when stack monitoring is enabled.

@barkbay
Copy link
Contributor Author

barkbay commented Apr 21, 2023

buildkite test this -f p=gke -m s=8.7.0,s=7.17.8

@barkbay
Copy link
Contributor Author

barkbay commented Apr 23, 2023

buildkite test this -f p=gke -m s=8.7.0,s=7.17.8

@barkbay
Copy link
Contributor Author

barkbay commented Apr 24, 2023

buildkite test this -f p=gke -m s=8.0.1,s=8.8.0-SNAPSHOT

@barkbay
Copy link
Contributor Author

barkbay commented Apr 24, 2023

Some "noteworthy" things if you want to review this PR:

@barkbay barkbay marked this pull request as ready for review April 24, 2023 08:27
@barkbay
Copy link
Contributor Author

barkbay commented Apr 24, 2023

buildkite test this -f p=ocp,s=8.7.0

@barkbay
Copy link
Contributor Author

barkbay commented Apr 25, 2023

Results from the last e2e test sessions:

  • OCP4 🟢
  • 7.17.8/8.0.1/8.7.0 on GKE 🟢
  • 8.8.0-SNAPSHOT 🟡 - 140/144 tests passed, TestKibanaStandalone failed. As for this failure, I don't think it is related to the changes in this PR.

Copy link
Contributor

@thbkrkr thbkrkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, this is great!

Looks also like we're writing code specifically to handle #6186 but why not fix it first? We would no longer need to detect if the data directory is mounted in a volume as this is normally impossible, no? So we could call WithContainersSecurityContext after stackmon.WithMonitoring, to reuse the same code to set the SecurityContext for beats sidecars.

@barkbay
Copy link
Contributor Author

barkbay commented Apr 26, 2023

Looks also like we're writing code specifically to handle #6186 but why not fix it first?

At first I thought this code was specifically handling #6186. But I've been wondering if, in general, we should not set readOnlyRootFilesystem to true only if we are pretty confident in the fact that the data are written in a volume, not in the container. I'm thinking about users that will upgrade to 2.8.0: if we set readOnlyRootFilesystem to true too "aggressively" we may end up with issues during upgrades if Elasticsearch is not configured properly. That's why I decided to set readOnlyRootFilesystem to true only if the default data volume is mounted as expected. If a user has a more involved storage configuration, with different volumes, it would have to test this flag manually first.

I have to admit this is still brittle, as there are other ways to mis-configure Elasticsearch with the default volume, ending up with data being written into the container. As you suggest, we could also always enable readOnlyRootFilesystem and let the user fix the configuration if the Pods crash, I'm trying to avoid that situation by keeping this code.

Copy link
Contributor

@thbkrkr thbkrkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I've been wondering if, in general, we should not set readOnlyRootFilesystem to true only if we are pretty confident in the fact that the data are written in a volume, not in the container. I'm thinking about users that will upgrade to 2.8.0: if we set readOnlyRootFilesystem to true too "aggressively" we may end up with issues during upgrades if Elasticsearch is not configured properly.

👍 👍

Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM I have not done many tests yet.

pkg/controller/elasticsearch/nodespec/podspec.go Outdated Show resolved Hide resolved
Privileged: pointer.Bool(false),
RunAsNonRoot: pointer.Bool(true),
ReadOnlyRootFilesystem: pointer.Bool(true),
AllowPrivilegeEscalation: pointer.Bool(false),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a shared variable or function here given that we use the exact same context twice? Also potentially we could refer to the new securitycontext package here, to have all contexts in one place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 done in 18818fc

test/e2e/test/elasticsearch/check_securitycontext.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM I managed to run a few tests on GKE and OCP and things seem to work fine!

@barkbay
Copy link
Contributor Author

barkbay commented Apr 27, 2023

I'll open an issue to update https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-security-context.html before the next release as it is definitely outdated.

Edit: I think we can actually work on this as part of #5913

@barkbay barkbay merged commit 7a57bbd into elastic:main Apr 27, 2023
@barkbay barkbay deleted the allow-readonly branch April 27, 2023 09:37
thbkrkr added a commit that referenced this pull request Apr 28, 2023
This adjusts the number of volumes expected from Beats sidecars in the Logstash
Stack Monitoring unit tests.

Why? Because we don't test PRs with an automatic merge of the main branch (🐛🐞), we missed that the tests in #6732 had to be updated to take into account the changes made by #6703, which adds a new temp volume to the Beats sidecars.
@barkbay barkbay changed the title Copy Elasticsearch configuration files before creating links Hardened Security Context for Elasticsearch May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Enhancement of existing functionality v2.8.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot run elasticsearch with security best practice readOnlyRootFilesystem: true
3 participants