Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution] Endpoint User Artifacts are served from Fleet Server #90513

Closed
9 tasks done
kevinlog opened this issue Feb 5, 2021 · 15 comments
Closed
9 tasks done
Assignees
Labels
Feature:Trusted Apps Security Solution Trusted Apps OLM Sprint Team:Defend Workflows “EDR Workflows” sub-team of Security Solution v7.13.0

Comments

@kevinlog
Copy link
Contributor

kevinlog commented Feb 5, 2021

Meta: #122
Feature: Trusted Applications per policy

Describe the feature:

Currently user artifacts such as Trusted Apps and Exceptions are served from an API on Kibana. With the introduction of a separate Fleet Server(s), the agents will no longer communicate directly with Kibana. As such, the artifacts should be served from a Fleet Server instance.

Implementation

Background

In the current implementation, Kibana injects relative URL's in to the policy describing the location of target artifacts. When the agent does a GET on an relative URL corresponding to an artifact, Kibana does a read on the corresponding document in Elastic Search, streams the data to the client, and caches the payload for subsequent requests. The payloads are immutable by definition, so caching is an appropriate optimization.

To implement this in Fleet Server, there must be some coordination between Kibana and the Fleet Server. Kibana will continue to manage the generation and storage on the artifacts in Elastic Search. Kibana will continue to enumerate the target artifacts in the policy. Fleet Server and Kibana will need to agree on (ie. contract):

  • The specification of the artifact in the policy. How is the artifact described in the policy document; whether by URL or a metadata block which is fixed up by the Fleet Server.
  • The location of the artifact in Elastic Search. This should be by contract, or explicitly specified in the metadata block.
  • Related to previous, the artifacts are currently stored as saved objects in Elastic Search. Fleet Server should not access Saved Objects directly as that is fragile. The artifacts should be moved to a new shared index to which the Fleet Server should have read-only access.

Fleet server will interpret the policy, adjust relative URL's as appropriate, and rewrite the policy for delivery. On request for an artifact, the Fleet Server will retrieve the corresponding payload from Elastic, validate the hash if in cache, and stream to the client. Immutable payload should be cached locally, possibly in a memory mapped file to conserver RAM. Payloads should be already be compressed on store in Elastic Search.

Why not pull data from Elastic Search directly?

Alternatively, we could give the Agent privileges to pull the artifacts from Elastic Search directly. There are some advantages to this:

  • No new API in the Fleet Server.
  • No coordination between Fleet Server and Kibana on this topic.

However, the downsides are considerable:

  • Given the endpoint read privileges on an artifact index would likely give them access to any of the artifacts with check. There is no intermediate choke point to enforce additional checks. In general, we are trying to avoid giving endpoints any read access.
  • The endpoint would depend on Elastic Search protocols to retrieve the data. This would complicate the ability to introduce intermediate HTTP load balancing proxies, which would be possible with normal HTTP semantics.
  • The endpoint would have intimate knowledge of the Elastic schema for the artifact indices, which would complicate migrations. Because endpoints are typically not upgraded atomically, but rolling over time, a change would artifact index schema would have to accommodate multiple active versions of the Endpoint in the field until rolling migration has completed. This may take several releases to complete.
  • Fleet Server can efficiently cache payloads at the edge, decreasing load on Elastic Search at scale.
@kevinlog kevinlog added the Team:Defend Workflows “EDR Workflows” sub-team of Security Solution label Feb 5, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-onboarding-and-lifecycle-mgt (Team:Onboarding and Lifecycle Mgt)

@kevinlog kevinlog self-assigned this Feb 5, 2021
@kevinlog kevinlog added the Feature:Trusted Apps Security Solution Trusted Apps label Feb 5, 2021
@kevinlog
Copy link
Contributor Author

kevinlog commented Feb 5, 2021

cc\ @scunningham @paul-tavares

@paul-tavares
Copy link
Contributor

I think there is allot here, and I’m not really clear on how much can be done prior to GA (fleet has been de-scoping as much as they can in order to keep the GA release target).

The above describes a new feature in fleet/fleet-server to be able to serve kibana generated artifacts to elastic-agent's. This does not exist today, but in talking to @scunningham , is something that other integrations are likely to need in the future.

Creating/storing and serving these types of artifacts is supported today for elastic-endpoint only and if the above full feature was to move forward for the GA release, much of the work that has already been done in endpoint could be leveraged. However several other areas of fleet would need to be touched potentially (ex. the above describes fleet-agent intercepting policies and adjusting URLs, which implies (maybe?) we would have a centralized and dedicated area in the overall agent policy to define artifacts? or possibly have fleet-agent know all the places it would need to look at in order to make such adjustment)

With that, I have added a discussion topic to the Fleet sync meeting so that we can have broader discussion (@ph , @ruflin , @scunningham FYI)

The minimum we have to do in order support endpoint downloading user artifacts from fleet-server is:

  1. write the artifacts to a new Elasticsearch index (no saved objects). I think this mean a change to the package - the fleet-server package or the endpoint package?
  2. create an API on the fleet-server side to serve up artifacts
    I think we have an opportunity here to change the relative path to the API and not stick to the one that is currently defined in Kibana for endpoint, but doing so will cause the Kibana API to have to support that new relative path as well for backward compatibility (if we update the policies with new paths, endpoint agents will be dispatched a new policy with a relative url to an artifact that they will attempt to download from kibana instead of fleet-server). This task would also take care of ensuring permissions are properly setup for fleet-server to access the index for artifacts
  3. migrate existing artifacts to the new elasticsearch index (and delete the saved object potentially - see next item)
  4. provide backwards compatibility for existing agents/endpoint’s by maintaining the existing kibana API that downloads artifacts.
    This could be just changing the API to retrieve the artifact from the new index --- or ---- should we maybe changes to do a redirect to fleet server? (redirect could be tricky in some env. with load balancers or multiple instances of fleet-server)
  5. Endpoint (elastic-endpoint) will need to change and ensure that the fleet-server URL is used (instead of Kibana) to create the full URI for downloading the artifact (Fleet team is currently working on providing that information in the overall agent policy so hopefully that will just make it down to elastic-endpoint to use) (cc/ @ferullo )

Q.❓ is migration needed (item 3 + 4 above)? Who can make that decision?

@ferullo
Copy link
Contributor

ferullo commented Feb 20, 2021

Endpoint (elastic-endpoint) will need to change and ensure that the fleet-server URL is used (instead of Kibana) to create the full URI for downloading the artifact

If the other changes requiring an Endpoint change are made Endpoint will definitely make this update. Just keep me in the loop so I know when the changes is needed.

In the meantime if you'd like to test Kibana/Fleet Server changes e2e with a real Endpoint you can use the advanced options below to edit these settings. They're what Endpoint uses in testing.

inputs:
- policy:
    linux: # or mac or windows
      advanced:
        artifacts:
          user:
              base_url: http://localhost:1234
              ca_cert: fill-me-in-with-a-pem-if-needed
              public_key: fill-me-in-with-a-pem-if-needed

@paul-tavares
Copy link
Contributor

Expanding a little more on some of what is needed to implement this along with some suggestions which are all open for comment 😄

This will likely generate several other implementation Issues.

New Index and Interface to fleet-server

This new index will also serve as the interface contract between Kibana and fleet-server. We can start with the schema currently implemented with saved objects.

Index Name: TBD...
Maybe .fleet-artifacts if this ends up in fleet. .endpoint-artifacts if it ends up remaining with the endpoint package.

Schema

properties: {
    identifier: {
      type: 'keyword',
    },
    compressionAlgorithm: {
      type: 'keyword',
      index: false,
    },
    encryptionAlgorithm: {
      type: 'keyword',
      index: false,
    },
    encodedSha256: {
      type: 'keyword',
    },
    encodedSize: {
      type: 'long',
      index: false,
    },
    decodedSha256: {
      type: 'keyword',
      index: false,
    },
    decodedSize: {
      type: 'long',
      index: false,
    },
    created: {
      type: 'date',
      index: false,
    },
    body: {
      type: 'binary',
    },
  },

Note: we don't currently encrypt the artifacts, but the initial SO type was defined with the properties that would allow for to be introduced at a later time. We could remove these if we don't see an immediate need for them

➕ And probably add the following:

{
  packageName: {    // The package name that "owns" the artifact
    type: 'keyword',
  },
  type: {            // Free text metadata for use by integration
    type: 'keyword',
    index: false,
  },
}

Or should we instead support an object (ex. meta) to which integrations can add anything to in order to help them manage their artifacts

❓ others?

new fleet-server API handler

The new API route and handler would created to handle downloading of artifacts. Like the current implementation, this would use the Agent's API key to get permission to the API.

route path: /api/v1/artifacts/{identifier}/{encodedSha256}

This route path (relative) will be used to inform the Endpoint where a given artifact should be downloaded from.

Kibana service

If the above ends up living in Fleet, then a server side service should be created/exposed for other Plugins to use (ex. endpoint). This service would abstract away the need to write directly to the index as well as the encoding/decoding of the artifact. Example: To create a new artifact, it could be as simple as:

artifactManager.create(__decodedContent__);

// could return something like:
{
  identifier: 'some_value', // used in the URL for download
  compressionAlgorithm: 'some value',
  encryptionAlgorithm: 'some value',
  encodedSha256: 'string',
  encodedSize: 12345,
  decodedSha256: 'string',
  decodedSize: 4321,
  packageName: 'endpoint',
  type: 'trusted_apps',
  created: 'iso-date',
  fleetServerRelativeUrl: '/api/artifact/{identifier}/{encodedSha256}'
}

The service would take care of encoding and storage (we would likely re-use what is already implemented in Kibana for endpoint).

@paul-tavares
Copy link
Contributor

@ph , @ruflin : Can you comment and let us know if you agree that this index/service should be part of Fleet? Also, any comment on the above initial high level design?

Getting agreement on this will help drive the definition of the index (I assume that goes into one of the packages - maybe the elastic agent package? cc/ @nchaulet ??) - and will then allows us to make progress on both ends (kibana + fleet-server).

/cc: @mostlyjason , @caitlinbetz , @kevinlog

@nchaulet
Copy link
Member

(I assume that goes into one of the packages - maybe the elastic agent package? cc/ @nchaulet ??) - and will then allows us to make progress on both ends (kibana + fleet-server).

I think this new index should be created by kibana right now as we create the other .fleet-* indices and later be created directly by an Elasticsearch plugin

@ph
Copy link
Contributor

ph commented Feb 23, 2021

As @nchaulet pointed out this should be created by Kibana (fleet) and moved to a plugin later.

@paul-tavares I do not understand why the "new fleet-server API handler" is needed?

@paul-tavares
Copy link
Contributor

I do not understand why the "new fleet-server API handler" is needed?

@ph with the move of agents communicating with fleet-server and not kibana, we (endpoint) also should be moving the download of these artifacts away from Kibana and to fleet-server, thus the need for an api on the fleet-server side.

@ph
Copy link
Contributor

ph commented Feb 23, 2021

Got it, I think I should have added more context in my answers, we could add a new endpoint to the gRPC definition, I think we need to figure out if endpoint connect directly to the gRPC endpoint or Elastic Agent make it available, @scunningham / @blakerouse were looking into it.

@kevinlog kevinlog removed the grooming label Feb 24, 2021
@paul-tavares
Copy link
Contributor

All,
I have not heard any objections to having this new API be part of Fleet Server and to create the index that will hold the data it needs be part of Kibana Fleet (right?). Thus, I'm going to proceed and request that the Elasticsearch team create the new new index with a name of .fleet-artifacts.

Also,
After thinking a little more about the proposal above, I would like to propose that we don't make this a generic service in kibana (under Fleet) for all plugins/integratinos to use (just yet) and that it should remain an Endpoint specific functionality for now. I think that when/if we decide for it to be generic there may be several other aspects of artifact management that would need to be accounted for in fleet which I don't think we have the time to do (ex. what happens when to an integration's artifacts when one is uninstalled?).

@ruflin
Copy link
Member

ruflin commented Feb 25, 2021

I like the idea of this new artifact endpoint as I could also see other use cases for it. How will fleet-server decide which Elastic Agent has access to which artifacts?

@paul-tavares I stumbled over your last comment on not making it generic. My concern is that fleet-server is generic and serves all purposes so it is already generic. So I would probably just move the code from endpoint to Fleet but keep it as is and not add other things to it yet? Still, this would only be used by endpoint so far.

@paul-tavares
Copy link
Contributor

paul-tavares commented Feb 25, 2021

@ruflin

How will fleet-server decide which Elastic Agent has access to which artifacts?

I talked briefly with @scunningham and I don't think it will know that - at least not initially. Doing so would possibly have to include changes in Fleet Server to look inside of an agent policy to see if it has a manifest and then look at the URLs for those (FYI - look below for what a manifest looks like for the endpoint integration policy). That could imply introducing a standard location for these URLs (artifacts manifest).

That being said, in order to access artifact (at least as currently defined) an attacker would have to know the hash of the file ++ have the API key.

Agent Policy with Endpoint Showing artifact manifest
id: 4b9a6a00-76d3-11eb-9191-6db59489cdbe
revision: 2
outputs:
  default:
    type: elasticsearch
    hosts:
      - 'http://localhost:9200'
agent:
  monitoring:
    enabled: true
    use_output: default
    logs: true
    metrics: true
inputs:
  - id: f7913856-32e9-4a43-9ee2-bdb4454dbcfd
    name: test
    revision: 1
    type: endpoint
    use_output: default
    meta:
      package:
        name: endpoint
        version: 0.17.1
    data_stream:
      namespace: default
    artifact_manifest:
      manifest_version: 1.0.0
      schema_version: v1
      artifacts:
        endpoint-exceptionlist-macos-v1:
          encryption_algorithm: none
          decoded_sha256: d801aa1fb7ddcc330a5e3173372ea6af4a3d08ec58074478e85aa5603e926658
          decoded_size: 14
          encoded_sha256: f8e6afa1d5662f5b37f83337af774b5785b5b7f1daee08b7b00c2d6813874cda
          encoded_size: 22
          relative_url: >-
            /api/endpoint/artifacts/download/endpoint-exceptionlist-macos-v1/d801aa1fb7ddcc330a5e3173372ea6af4a3d08ec58074478e85aa5603e926658
          compression_algorithm: zlib
        endpoint-exceptionlist-windows-v1:
          encryption_algorithm: none
          decoded_sha256: d801aa1fb7ddcc330a5e3173372ea6af4a3d08ec58074478e85aa5603e926658
          decoded_size: 14
          encoded_sha256: f8e6afa1d5662f5b37f83337af774b5785b5b7f1daee08b7b00c2d6813874cda
          encoded_size: 22
          relative_url: >-
            /api/endpoint/artifacts/download/endpoint-exceptionlist-windows-v1/d801aa1fb7ddcc330a5e3173372ea6af4a3d08ec58074478e85aa5603e926658
          compression_algorithm: zlib
        endpoint-trustlist-macos-v1:
          encryption_algorithm: none
          decoded_sha256: d801aa1fb7ddcc330a5e3173372ea6af4a3d08ec58074478e85aa5603e926658
          decoded_size: 14
          encoded_sha256: f8e6afa1d5662f5b37f83337af774b5785b5b7f1daee08b7b00c2d6813874cda
          encoded_size: 22
          relative_url: >-
            /api/endpoint/artifacts/download/endpoint-trustlist-macos-v1/d801aa1fb7ddcc330a5e3173372ea6af4a3d08ec58074478e85aa5603e926658
          compression_algorithm: zlib
        endpoint-trustlist-windows-v1:
          encryption_algorithm: none
          decoded_sha256: d801aa1fb7ddcc330a5e3173372ea6af4a3d08ec58074478e85aa5603e926658
          decoded_size: 14
          encoded_sha256: f8e6afa1d5662f5b37f83337af774b5785b5b7f1daee08b7b00c2d6813874cda
          encoded_size: 22
          relative_url: >-
            /api/endpoint/artifacts/download/endpoint-trustlist-windows-v1/d801aa1fb7ddcc330a5e3173372ea6af4a3d08ec58074478e85aa5603e926658
          compression_algorithm: zlib
        endpoint-trustlist-linux-v1:
          encryption_algorithm: none
          decoded_sha256: d801aa1fb7ddcc330a5e3173372ea6af4a3d08ec58074478e85aa5603e926658
          decoded_size: 14
          encoded_sha256: f8e6afa1d5662f5b37f83337af774b5785b5b7f1daee08b7b00c2d6813874cda
          encoded_size: 22
          relative_url: >-
            /api/endpoint/artifacts/download/endpoint-trustlist-linux-v1/d801aa1fb7ddcc330a5e3173372ea6af4a3d08ec58074478e85aa5603e926658
          compression_algorithm: zlib
    policy:
      windows:
        events:
          dll_and_driver_load: true
          dns: true
          file: true
          network: true
          process: true
          registry: true
          security: true
        malware:
          mode: prevent
        ransomware:
          mode: prevent
        popup:
          malware:
            enabled: true
            message: ''
          ransomware:
            enabled: true
            message: ''
        logging:
          file: info
        antivirus_registration:
          enabled: false
      mac:
        events:
          process: true
          file: true
          network: true
        malware:
          mode: prevent
        ransomware:
          mode: prevent
        popup:
          malware:
            enabled: true
            message: ''
          ransomware:
            enabled: true
            message: ''
        logging:
          file: info
      linux:
        events:
          process: true
          file: true
          network: true
        logging:
          file: info
fleet:
  kibana:
    protocol: http
    hosts:
      - 'localhost:5601'

All,
FYI - I have created an issue to track adding the new index (thanks @nchaulet for pointing me in the right direction).

#92820

@paul-tavares
Copy link
Contributor

FYI - Wanted to document this here so that it does not get lost from conversations/slack.

Endpoint Security still needs a feature flag

We're in a in-between state of Kibana -vs- fleet-server for how agents are enrolled managed, and thus until fleet server is integrated with Kibana CI (its not yet, right?) and we are fully moved over to it, Endpoint Security will still need to maintain the following:

  1. Artifact Manifest will need to still have the kibana download relative url for each artifact
  2. we will still need to maintain the kibana download API for artifacts

So the default behaviour will be to work as it does today - we update the manifest with relative URLs for kibana and the kibana download API will continue to serve those. Note that internally, we will store the artifact under the new index (.fleet-artifacts) and the API will be adjusted to retrieve that artifact from this new index. And with the fleet server feature flag enabled, we will write the manifest artifacts with a fleet-server relative URL.

If the overall migration strategy is that pre-v7.13 agents will continue to communicate and stream data to kibana/elasticsearch, then that indicates endpoint will also still continue to download artifacts from kibana, thus having the download API continue to work will ensure no failures there. This also implies that we likely will not be able to remove the download API from kibana until 8.0 (or maybe the GA release - v7.14?).

Re: Migration / dual working mode

I still need to think a little more about the migration and how that impacts endpoint and artifacts. Specifically, if agents/endpoint will continue to communicate with kibana, then how do we handle writing the artifact manifest with the correct relative URL? This might mean that user generated artifacts may be broken with v7.13 until the users re-enroll the agents with fleet-server (I say "may" because the manifest for existing policies (and new ones) would only be updated if a new artifact was created/deleted (ex. new Trusted App). I think that since the overall migration plan is to not migrate, then we should also be ok - perhaps just documentation indicating that this could happen.

@paul-tavares
Copy link
Contributor

Just a quick summary:

All of the Endpoint/Fleet kibana changes needed to support Fleet-Server around the serving of Artifacts is not in place. This support needs to be enabled explicitly at this time by settings the following Kibana configuration settings:

xpack.securitySolution.enableExperimental:
  - fleetServerEnabled

xpack.fleet.agents.fleetServerEnabled: true

This behaviour will be modified/removed prior to feature freeze of v7.13 once flee-server becomes the default (only) way to manage agents. Note that the elastic-endpoint is in the process of supporting fleet-server for artifact download, but that should be in place soon as well.

Closing out this issue. Thank you all for your help on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Trusted Apps Security Solution Trusted Apps OLM Sprint Team:Defend Workflows “EDR Workflows” sub-team of Security Solution v7.13.0
Projects
None yet
Development

No branches or pull requests

7 participants