Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Create Threat Fieldset - Stage 2 Proposal #1293

Merged
merged 38 commits into from
Jun 23, 2021
Merged

Conversation

peasead
Copy link
Contributor

@peasead peasead commented Mar 8, 2021

  • Have you signed the contributor license agreement? Yes
  • Have you followed the contributor guidelines? yes
  • For proposing substantial changes or additions to the schema, have you reviewed the RFC process? Yes
  • If submitting code/script changes, have you verified all tests pass locally using make test? NA
  • If submitting schema/fields updates, have you generated new artifacts by running make and committed those changes? NA
  • Is your pull request against master? Unless there is a good reason otherwise, we prefer pull requests against master and will backport as needed. Yes
  • Have you added an entry to the CHANGELOG.next.md? Yes

CC

Preview of the markdown proposal

@peasead peasead added the RFC label Mar 8, 2021
@peasead peasead self-assigned this Mar 8, 2021
@peasead peasead changed the title [RFC] Create Threat - Stage 2 Proposal [RFC] Create Threat Fieldset - Stage 2 Proposal Mar 8, 2021
Copy link
Member

@ebeahan ebeahan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving observations here that I couldn't easily add into the file diff:

  • I noticed in both usage examples the threat.indicator.type field is an array. Is that expected? If so, let's update the field definition to reflect.
  • I originally proposed using wildcard for threat.indicator.description. Since we're reevaluating wildcard usage in ECS for now, I suggest we revert to keyword.
  • threat.indicator.dataset has a typo in the Example column: theatintel
  • For threat.indicator.marking.tlp, I realized the examples here are using all caps, but the expected values in the field's description are only capitalizing the first letter (RED vs. Red). The TLP convention appears to use all caps for TLP colors. Do we imagine most users will search TLP designators using all caps? Do we want to explicitly specify a convention for the field or leave it flexible?
  • Reviewing the second example mapping, Adding threat intelligence match/enrichment to another document which could be in a source event index or signals index., it looks like the indicator.matched.* fields could be a list. If these values could be a list, do we see these fields residing under a nested object?
  • @peasead @dcode, you've both contributed to the proposal extensively, so feel free to also list yourselves as authors in addition to SMEs under People. 😃

rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
rfcs/text/0008-threat-intel.md Show resolved Hide resolved
@peasead
Copy link
Contributor Author

peasead commented Mar 15, 2021

  • I noticed in both usage examples the threat.indicator.type field is an array. Is that expected? If so, let's update the field definition to reflect.
    [x] Done
    • I was just providing multiple examples in the table. I've updated it to be clearer on the markdown doc and used "expected values", in the yml, to reflect all of the other options.
  • I originally proposed using wildcard for threat.indicator.description. Since we're reevaluating wildcard usage in ECS for now, I suggest we revert to keyword.
    [x] Done
  • threat.indicator.dataset has a typo in the Example column: theatintel
    [x] Done
  • For threat.indicator.marking.tlp, I realized the examples here are using all caps, but the expected values in the field's description are only capitalizing the first letter (RED vs. Red). The TLP convention appears to use all caps for TLP colors. Do we imagine most users will search TLP designators using all caps? Do we want to explicitly specify a convention for the field or leave it flexible?
    [x] Updated all TLP examples to match
    • I think we should leave it flexible. MISP, as an example, is using all lower-case TLP. I would prefer to set the examples to align with the standard, but I wouldn't want an error or to reject based on case.
  • Reviewing the second example mapping, Adding threat intelligence match/enrichment to another document which could be in a source event index or signals index., it looks like the indicator.matched.* fields could be a list. If these values could be a list, do we see these fields residing under a nested object?
    • I think that was the idea, @rylnd does that sound right?
  • @peasead @dcode, you've both contributed to the proposal extensively, so feel free to also list yourselves as authors in addition to SMEs under People. 😃
    [x] Done

@rylnd
Copy link
Contributor

rylnd commented Mar 15, 2021

Reviewing the second example mapping, Adding threat intelligence match/enrichment to another document which could be in a source event index or signals index., it looks like the indicator.matched.* fields could be a list. If these values could be a list, do we see these fields residing under a nested object?

  • I think that was the idea, @rylnd does that sound right?

As I see it, there are two proposed use cases for this fieldset: the indicator data themselves, and the enrichment of indicator/match data on extant event data. Defining a common mapping for both uses has several options/tradeoffs that should be discussed:

  1. threat.indicator is a nested object
    a. Pro: Allows enriched events to be queried by individual indicators
    b. Con: storage/performance issues with ingestion of nested documents from beats/agents
  2. threat.indicator is a regular object
    a. Con: enriched events lose the context of the indicators that they matched: we retrieve a flattened object and cannot accurately query e.g. "events which matched a domain from provider A"
    b. Con: mapping conflict with detection alerts
    • Because 1a was deemed critical functionality for CTI rules, Detections already uses a nested mapping for enriched alerts. Conflicts with e.g. detection rules written against filebeat-* could potentially be solved within the detection engine, but fundamentally there would be a mapping conflict between indicator indices and alerts indices.

@peasead
Copy link
Contributor Author

peasead commented Mar 16, 2021

Reviewing the second example mapping, Adding threat intelligence match/enrichment to another document which could be in a source event index or signals index., it looks like the indicator.matched.* fields could be a list. If these values could be a list, do we see these fields residing under a nested object?

  • I think that was the idea, @rylnd does that sound right?

As I see it, there are two proposed use cases for this fieldset: the indicator data themselves, and the enrichment of indicator/match data on extant event data. Defining a common mapping for both uses has several options/tradeoffs that should be discussed:

  1. threat.indicator is a nested object
    a. Pro: Allows enriched events to be queried by individual indicators
    b. Con: storage/performance issues with ingestion of nested documents from beats/agents

  2. threat.indicator is a regular object
    a. Con: enriched events lose the context of the indicators that they matched: we retrieve a flattened object and cannot accurately query e.g. "events which matched a domain from provider A"
    b. Con: mapping conflict with detection alerts

    • Because 1a was deemed critical functionality for CTI rules, Detections already uses a nested mapping for enriched alerts. Conflicts with e.g. detection rules written against filebeat-* could potentially be solved within the detection engine, but fundamentally there would be a mapping conflict between indicator indices and alerts indices.

Thanks, @rylnd that was my understanding as well and just wanted to double-check.

@ebeahan if you were asking if threat.indicator.matched.* would be a nested object, it looks like we're in agreement that it should be. Does that answer your question?

@ebeahan
Copy link
Member

ebeahan commented Mar 22, 2021

@ebeahan if you were asking if threat.indicator.matched.* would be a nested object, it looks like we're in agreement that it should be. Does that answer your question?

In the second example mapping, the matched field is a single field, but the comment states it may be a list:

                /* `matched` will provide context about which of the indicators above matched on this
                    particular enrichment. If multiple matches for this indicator object, this could
                    be a list */
                "matched": "sha256",

The list of fields in the proposal defines threat.indicator.matched.* as an object with three child fields:

  • threat.indicator.matched.atomic
  • threat.indicator.matched.field
  • threat.indicator.matched.type

I wanted to understand better which direction to take and make sure that's captured correctly. Right now, the field definitions don't reflect that these fields are either a list or a nested object:

  - name: indicator.matched.atomic
    level: extended
    type: keyword
    short: Indicator atomic match
    description: >
      Identifies the atomic indicator that matched a local environment endpoint or network event.
    example: example.com

  - name: indicator.matched.field
    level: extended
    type: keyword
    short: Indicator field match
    description: >
      Identifies the field of the atomic indicator that matched a local environment endpoint or network event.
    example: file.hash.sha256

  - name: indicator.matched.type
    level: extended
    type: keyword
    short: Indicator type match
    description: >
      Identifies the type of the atomic indicator that matched a local environment endpoint or network event.
    example: domain-name

rfcs/text/0008/threat.yml Outdated Show resolved Hide resolved
@rylnd
Copy link
Contributor

rylnd commented Mar 25, 2021

Briefly capturing a few notes from recent discussions around this work:

  • Some of the discussion above (mostly me, sorry 😅) pertains to the dual usage of this fieldset, and the presence/absence/mappings of these fields in each situation. It's now my understanding that, while necessary, those details are orthogonal to the RFC itself, and should mainly be captured in a separate document pertaining to examples/usage.
  • In order to provide more context for the enrichment use case, we'd like to prescribe adding the event fieldset (threat.indicator.event.*). @ebeahan should that be captured in this proposal, or should this sort of composition only be captured in the aforementioned examples/usage documentation?

Co-authored-by: Eric Beahan <eric.beahan@elastic.co>
devonakerr
devonakerr previously approved these changes Mar 29, 2021
Copy link

@devonakerr devonakerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the move from wildcard to keyword is the right one for threat.indicator.description, and acknowledge the TLP value updates. Also pleased to see the author suggestions incorporated.

@ebeahan
Copy link
Member

ebeahan commented Mar 31, 2021

Some of the discussion above (mostly me, sorry 😅) pertains to the dual usage of this fieldset and the presence/absence/mappings of these fields in each situation. It's now my understanding that, while necessary, those details are orthogonal to the RFC itself and should mainly be captured in a separate document pertaining to examples/usage.

Good context to capture to some extent in the RFC proposal doc still, I think.

In order to provide more context for the enrichment use case, we'd like to prescribe adding the event fieldset (threat.indicator.event.*). @ebeahan should be captured in this proposal, or should this sort of composition only be captured in the aforementioned examples/usage documentation?

Yes, we should capture that intent. I imagine it would be implemented like the other proposed field nestings (example)?

Can we update the current examples or add another example demonstrating how these fields are intended to be used?

@peasead
Copy link
Contributor Author

peasead commented Apr 5, 2021

Some of the discussion above (mostly me, sorry 😅) pertains to the dual usage of this fieldset and the presence/absence/mappings of these fields in each situation. It's now my understanding that, while necessary, those details are orthogonal to the RFC itself and should mainly be captured in a separate document pertaining to examples/usage.

Good context to capture to some extent in the RFC proposal doc still, I think.

In order to provide more context for the enrichment use case, we'd like to prescribe adding the event fieldset (threat.indicator.event.*). @ebeahan should be captured in this proposal, or should this sort of composition only be captured in the aforementioned examples/usage documentation?

Yes, we should capture that intent. I imagine it would be implemented like the other proposed field nestings (example)?

Can we update the current examples or add another example demonstrating how these fields are intended to be used?

@rylnd can I help with anything here?

This is used to preserve the event fields of the original indicator
event in the case of said indicator enriching another event.
@rylnd
Copy link
Contributor

rylnd commented Apr 12, 2021

@peasead @ebeahan as discussed, I added the event fieldset in fb057b0. However, we expect this nested fieldset to only be necessary in the "enrichment" use case, as one could leverage the existing root event fields for the indicator document/event itself.

This brings us back to the point about documentation of use cases: with the exception of the above, I believe that the existing use case documentation, introduced in #1127, sufficiently describe the situation. If these RFCs are meant to be the source for that type of usage data, then I think we're good here!

@rylnd
Copy link
Contributor

rylnd commented Apr 12, 2021

I wanted to understand better which direction to take and make sure that's captured correctly. Right now, the field definitions don't reflect that these fields are either a list or a nested object

@ebeahan to this point, I think the issue that I'm struggling with is that the mappings could be different depending on the usage. I tried to break this down in a previous comment but I haven't seen any responses.

@devonakerr
Copy link

Ryland, I think both use-cases defined are applicable - what other use-cases do you anticipate?

rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
rfcs/text/0008-threat-intel.md Outdated Show resolved Hide resolved
@ebeahan
Copy link
Member

ebeahan commented Jun 9, 2021

Thanks, @peasead, for making the latest rounds of changes!

@rylnd @peasead there was some out-of-band discussion last week. Were there any outcomes or changes that should be reflected here? Or is the PR ready for another review?

@peasead
Copy link
Contributor Author

peasead commented Jun 9, 2021

Thanks, @peasead, for making the latest rounds of changes!

@rylnd @peasead there was some out-of-band discussion last week. Were there any outcomes or changes that should be reflected here? Or is the PR ready for another review?

I feel like we're ready, but I'll wait for @rylnd and @devonakerr to voice any dissent.

@devonakerr
Copy link

No dissent, I'll start my review.

ebeahan added a commit that referenced this pull request Jun 10, 2021
* Adds initial Stage 1 information

Much of this was taken from what was deleted from #1293 and is in
various stages of completion. Will annotate and iterate on the PR.

* Undo noisy markdown style changes

* Add Stage 1 PR link

* Link to filebeat module

Co-authored-by: Eric Beahan <ebeahan@gmail.com>

* Link to public kibana issue

* Fully qualify our proposed fields table

* Adds/updates missing enrichment fields in YML form

* Update enrichment pipeline pseudocode

* Removes unnecessary field renames, as fields no longer conflict
* Adds a clause for setting a default array value for
  `threat.enrichments`

* Remove resolved TODO

This is in fact redundant, but still useful.

* Add event fields to enrichment fieldset

* Add Devon K. as RFC sponsor

* Add event.reference to example enrichment document

This provides a more complete example.

* set advancement date for stage 1

Co-authored-by: Eric Beahan <ebeahan@gmail.com>
Co-authored-by: Eric Beahan <eric.beahan@elastic.co>
@peasead peasead requested review from ebeahan and rylnd June 14, 2021 18:50
Copy link
Contributor

@rylnd rylnd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All my comments have been addressed. I defer to @devonakerr for official approval, but LGTM!

rfcs/text/0008/threat.yml Outdated Show resolved Hide resolved
Copy link

@devonakerr devonakerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rylnd
Copy link
Contributor

rylnd commented Jun 22, 2021

@ebeahan: @peasead is out this week, but I think this is good to merge! I'm happy to do any of the leg work here, just let me know.

The documentation for the `indicator.type` field lists
`x-509-certificate` as an expected value. However, the correct STIX 2.0
Cyber Observable type name for X509 Certificates is `x509-certificate`.
Copy link
Member

@ebeahan ebeahan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I'll set the date and merge 🎉 🚢

@ebeahan ebeahan merged commit 44d2ecd into master Jun 23, 2021
@ebeahan ebeahan deleted the create-threat-stage-2 branch June 23, 2021 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants