Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add related.entity field #2360

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

romulets
Copy link
Member

@romulets romulets commented Aug 14, 2024

Background

Elastic Cloud Security Team has been focusing, this past year, on Cloud Detection and Response (CDR). One of the first steps towards the CDR vision is to enhance investigation workflows for the Cloud Security use-case in SIEM.

As part of enhancing investigation workflows it's necessary to be able to correlate events and entities. Meaning, if an alert is triggered on the ec2 instance i-000000000, it is of great value to easily be able to search all the events related to that entity, across multiple indices, with one query. Therefore we are working on extracting entities and enabling them to be correlated.

Why related.entity

With this background, we've researched a few options on what would be the best approach to enable such feature, and arrived at the ecs field related. Based on the related description:

This field set is meant to facilitate pivoting around a piece of data.

Some pieces of information can be seen in many places in an ECS event.
To facilitate searching for them, store an array of all seen values to their
corresponding field in related..

To add a broad related.entity field that can hold any needed identifier to pivot data on seems to be well fitted. This would enable customers to simply run related.entity: "i-000000000" and get all the hits to that specific cloud resource.

What is an entity?

An "entity" in our context refers to any discrete component within an IT environment that can be uniquely identified and monitored. This broad term encompasses both managed and unmanaged elements.

The term "entity" is broader than the current set of available fields under related. Although ip, user and hosts can be identities, there is a lack of space to represent messaging queues, load balancers, storage systems, databases and others. Therefore the proposal to add a new field.

@romulets romulets requested a review from a team as a code owner August 14, 2024 13:11
Copy link

Documentation changes preview: https://ecs_bk_2360.docs-preview.app.elstc.co/diff

@romulets romulets changed the title Add related.entities field Add related.entity field Aug 14, 2024
@Samrose-Ahmed
Copy link

Think you linked an internal ticket.

Do you expect this to duplicate with e.g. related.user or related.ip or is it only for leftover entities not representable via the existing related fields.

@@ -70,3 +70,15 @@
identifiers include FQDNs, domain names, workstation names, or aliases.
normalize:
- array

- name: entity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entity is an extremely broad category. The danger with using this is it will mean different things to different people, and become a bucket that will hold almost anything.

This would reduce the effectiveness of having a common schema, as this field will be used by different users to hold different types of data, and cause problems with writing queries, doing data normalization, etc. Already in the description, there's resource IDs, email addresses, and hostnames, which are three different things.

I think you'll need to consider the use-cases for this, and refine the definition of what this is intended to hold. Maybe just cloud_resource_names? Or have multiple fields for the different types of data that could be related.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Michael, I see where you coming from.

However, our need is very broad, indeed. What we wish is to be able to find any event related to an entity. What is an entity? Can be very much anything. A workstation. A bare metal machine. A user. An ec2 instance. A database. Pretty much anything a SoC team is concerned about.

But then why not specify cloud_resource_name or cloud_entity? Ideally, from a user experience perspective, a user doesn't need to know all the ecs field types to search by something. Doesn't need to think twice or search before typing its search. I do see the point over data organisation on having very separated buckets, but from a search perspective, that decreases the experience. Beyond that, some concepts are just overlapping. We have related.host, related.ips which both hold information about a machine that can be seen as an entity. So where does the data about that specific host exist? We believe it would be easier to just have all the data in related.entity and search from there.

With that said, you mentioned that having it all in one field would reduce the effectiveness of data. Can you expand on that? Why would it cause problems writing queries and doing data normalisation?

Tagging @tinnytintin10 so he can give his cents as product (if he wishes).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your thoughtful analysis of the proposal, @mjwolf!

You're right that "entity" is an extremely broad category, and that's intentional. Let me explain our reasoning and address your concerns:

  1. Regarding data consistency, as related.entity is of keyword type, consistency in data format isn't a concern for searchability. All values stored will be searchable as keywords, regardless of the identifier format.
  2. Regarding query performance, given that related.entity will contain identifiers (such as ARNs, emails, hostnames, etc.) and is mapped to the keyword type, we don't anticipate significant performance issues. Querying over keyword fields is generally efficient in Elasticsearch, especially for exact matches which is the primary use case here.
  3. Regarding data analysis, the introduction of this field should not complicate data analysis. In fact, it may simplify certain types of analysis by providing a unified field for correlation across different entity types. For more specific analyses, users can still rely on the more targeted related fields and other event details.
  4. This approach also lends itself to future extensibility. Suppose certain entity types require more specific handling in the future (i.e., implicit entity type fields like host and user ecs fields), in that case, we can introduce additional fields without breaking the functionality of related.entity.

Regarding alternatives (like the one I mentioned in the last bullet above), creating implicit entity fieldsets for each possible entity type would be a significant undertaking (especially in the cloud). If we were to follow the pattern of existing fields like "host" and "user", we'd quickly run into an explosion of entity types. Consider this non-exhaustive list of potential generic entity types we'd need to account for/introduce:

expand me

a few of these might have some ecs fields available...

  • ACCESS_ROLE
  • API_GATEWAY
  • BACKUP_SERVICE
  • BUCKET
  • CICD_SERVICE
  • CLOUD_LOG_CONFIGURATION
  • CDN
  • CONFIG_MAP
  • CONTAINER_IMAGE
  • CONTAINER_REGISTRY
  • CONTAINER_REPOSITORY
  • DATA_WORKFLOW
  • DATA_WORKLOAD
  • DATABASE
  • DNS_RECORD
  • DNS_ZONE
  • DOMAIN
  • EMAIL_SERVICE
  • ENCRYPTION_KEY
  • FILE_SYSTEM_SERVICE
  • FIREWALL
  • GATEWAY
  • GOVERNANCE_POLICY
  • LOAD_BALANCER
  • MANAGED_CERTIFICATE
  • MANAGEMENT_SERVICE
  • MAP_REDUCE_CLUSTER
  • MESSAGING_SERVICE
  • MONITOR_ALERT
  • NETWORK_ADDRESS
  • NETWORK_INTERFACE
  • PEERING
  • PRIVATE_ENDPOINT
  • PRIVATE_LINK
  • RAW_ACCESS_POLICY
  • REGION
  • REGISTERED_DOMAIN
  • RESOURCE_GROUP
  • ROUTE_TABLE
  • SEARCH_INDEX
  • SECRET
  • SECRET_CONTAINER
  • SERVERLESS
  • SERVERLESS_PACKAGE
  • SERVICE_CONFIGURATION
  • SERVICE_USAGE_TECHNOLOGY
  • SNAPSHOT
  • STORAGE_ACCOUNT
  • SUBNET
  • SUBSCRIPTION
  • VIRTUAL_NETWORK
  • VOLUME
  • WEB_SERVICE

This list doesn't even include some of the entity types we already have ECS fields for, such as those related to hosts, users, and Kubernetes (which ECS calls orchestrator).

Creating and maintaining fields for each of these entity types would not only take considerable time to implement but would also result in a proliferation of field types. This approach would place a substantial cognitive burden on users, requiring them to remember a large number of specific fields for different entity types.

The related.entity field addresses this challenge by providing a single, unified field for correlation. Users don't need to know the implicit entity type for each resource to correlate events, greatly simplifying the process. For instance, they wouldn't need to know that a bucket is for blob storage or that an ARN identifies an AWS resource - they could simply use related.entity to find all events related to that entity. i.e., related.entity offers a user-friendly way to correlate events across diverse entity types without overwhelming users or complicating the schema unnecessarily.

As we move forward, we'll continue to evaluate and adapt based on the evolving needs of our users and the insights we gain from this implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tinnytintin10 for the excellent explanation, I think this makes sense for achieving what you want to achieve

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjwolf what do we need to wrap this PR up? If you approve we can merge or it must be discussed in other forums?

@tinnytintin10
Copy link

Do you expect this to duplicate with e.g. related.user or related.ip or is it only for leftover entities not representable via the existing related fields.

hey @Samrose-Ahmed! The related.entity field is designed to complement, not replace, these existing fields. While there may be some overlap, the primary purpose of related.entity is to provide a unified field for correlation across a wide range of entity types, including those not currently represented in other related fields.

We recommend that data producers continue to populate specific related fields (like related.ip for IP addresses) in addition to related.entity. This approach ensures backward compatibility and allows for more specific queries when needed, while also enabling broad correlation queries using related.entity.

The goal is to enhance search capabilities rather than create redundancy. In cases where an identifier could be placed in both a specific related field and related.entity, populating both will maximize search flexibility

Wdyt?

schemas/related.yml Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants