Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mapping network event guidance doc #969

Merged
merged 9 commits into from
Sep 29, 2020

Conversation

ebeahan
Copy link
Member

@ebeahan ebeahan commented Sep 14, 2020

Add a new documentation page, Mapping Network Events, which provides guidance and best practices for mapping network-related events to ECS.

Goals:

  • Provide better guidance on the usage of source/destination and when client/server should also be populated.
  • Provide example application of the network.* & related.* field sets.
  • Document that network.protocol is always populated when an event is categorized as: event.category:network + event.type:protocol.

Relates to #948

Docs Preview

@dainperkins
Copy link
Contributor

I don't quite understand the source/dest vs client/server differentiation you are shooting for (assuming complete session visibility for the purposes of the discussion)

  • From a strict TCP standpoint they should be the same (assuming you can determine which side is the syn, and which side is the ack, or which station sent the first packet - which from e.g. a netflow or packetbeat perspective is usually not a problem - aws vpcflow as the exception unless things have changed. w/o syn/ack or first packet its a miserable process to determine which side is the initiator (aside from e.g. well known ports - but get into mongodb and its literally a toss up as everything is high random ports)

  • From a udp standpoint its also going to depend on where the first packet came from, the difference being the 5 tuple isn't correlated with a higher level session, but theres still a dns request (randomPortA to 53) and response (53 -> randomPortA)

Its been a really long time since I looked at netflow5 unidirectional flows, but, the observer (assuming no asymmetric routing) should still know the source and destination and track the same across src->dst & dst->src connection (both flows should use the same src/dst identifiers)

Is there some use case / corner case I'm missing (or maybe just some esoteric blind spot I can't get past?)

@webmat
Copy link
Contributor

webmat commented Sep 23, 2020

@dainperkins what we're trying to do here is simply to more clearly establish that we want source+destination to always be populated. Then when it makes sense / is desired, folks can also use client+server.

So this is just to help make sure people can reliably depend on source+destination to be populated across the hundreds of data sources they may have to deal with.

Now if you see a problem with the current wording, we can always improve on that.

[float]
==== Related fields

The `related.ip` field captures all the IPs present in the event in a single array:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe call out the specific ips to include in the related.ip field in the example?

The related.ip field captures all the IPs present in the event in a single array, in this case including source.ip, destination.ip, as well as any IPs included in the dns.answer field:

Should we include e.g. pipeline code to show how to determine if a dns.answer is an IP?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should provide example code here. This would be fine in a blog post, but I think this will be too much of a maintenance burden.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe call out the specific ips to include in the related.ip field in the example?

Good call out. I can definitely add a little more to the explanation.

Should we include e.g. pipeline code to show how to determine if a dns.answer is an IP?

I approached this as an implementation-neutral reference of sorts. I've kept this focused on what the final mappings should be and avoided being too prescriptive about the how.


Note this event contains additional details that would populate additional fields (such as the <<ecs-dns>>) if this was a complete mapping example. These additional fields are omitted here to focus on the network details.

[source,json]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use an ECS version of the source data?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose an example event that wasn't already mapped to ECS trying to capture each steps in the exercise of mapping the event's network-related values.

Would breaking down an event that's already mapped to ECS (e.g. a packetbeat DNS event) for the example be more useful?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think the original makes more sense now that I've thought about it. I think the example data threw me for a loop as any request/response protocol analysis will have clear source/destination linking to client/server..

basically for anything besides unidirectional flows (meraki netflow, aws vpcflow) & broadcast requests (dhcp) copy source/destination to client/server.

for the corner cases just use source/dest.

I think I may be getting wrapped around an axle that is as much about base semantics, as it is about using the data e.g. the network siem page displays...

@dainperkins
Copy link
Contributor

@ebeahan & @webmat
Ok I think I get it - the basis for this is the idea that a given network observer will NOT be indicating the session role of a given unidirectional session? that is to say we are not treating source and destination pairs as e.g. unidirectional flow indicators with no concept of transport or application session (i,e, tcp: syn|syn-ack or udp: e.g. dns request/response)

for example (stripped of irrelevant data, and no transport or app session - aws vpc flows used to look like this, unsure if they still do)

TCP Example

Event 1:
source: 192.168.1.10:42113 -> destination: 192.168.1.50:443. 1 packet, 10 bytes (syn)

Event 2:
source: 192.168.1.50:443 -> destination: 192168.1.10:42113 1 packet 10 bytes (syn-ack)

Event 3:
source: 192.168.1.10:42113 -> destination: 192.168.1.50:443. 1 packet, 10 bytes (ack)

or for UDP

Event 1:
source: 192.168.1.10:42113 -> destination: 192.168.1.50:53. 1 packet, 10 bytes (dns request)

Event 2:
source: 192.168.1.50:53 -> destination: 192168.1.10:42113 1 packet 10 bytes (dns response)

@webmat
Copy link
Contributor

webmat commented Sep 23, 2020

@ebeahan I think we will need to revisit a lot of the other static asciidoc pages. For example there's overlap between some old pages and newer ones. We'll also need to consider the different personas that consume ECS, and help guide users to the different sections in a more useful manner, right from the top.

But for now, I think we should move this page under "Using ECS", perhaps as the last sub-page. It's a bit weird to move it to this section, but the "Migrating" and "Additional" sections are too far down IMO, and not as up to date. So adding this new page down there runs the risk of not being noticed.

@ebeahan
Copy link
Member Author

ebeahan commented Sep 23, 2020

Ok I think I get it - the basis for this is the idea that a given network observer will NOT be indicating the session role of a given unidirectional session?

The aim is to provide a reference where we can refer users for how network events should be mapped. @webmat captured a list in the second checklist item from #948 which served as the foundation.

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great content, this will be helpful :-)

I have a few comments below on content suggestions, let me know what you think.

Network events are not only limited to using `related.ip`. If hostnames or other host identifiers were present in the event, `related.hosts` should be populated too.

[float]
==== Event fields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should name this section "Categorizing the event" or something. The act of hardcoding these values per type of event is kind of a different thing than the mapping of values to different fields that have been discussed so far.

We could also adjust the wording in the first paragraph of this section to also link to the categorization section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could also have the network.* section and event.* section grouped together somehow, as the former is also related to categorization, as you explain wrt event.category: protocol.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged the two sections by dovetailing the network.* section onto the event categorization portion.

I also realized there could be additional event.* fields mapped that this example doesn't cover, so I also made some adjustments to focus on event categorization over the event.* field set in general.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for the adjustments

}
----

Most <<ecs-allowed-values-event-category,event.category>>/<<ecs-allowed-values-event-type,event.type>> ECS pairings are complete on their own. However, the pairing of `event.category:network` and `event.type:protocol` is an exception. When these two fields/value pairs both used to categorize an event, the `network.protocol` field should also be populated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should conclude with a section that shows the full event, that includes all snippets discussed so far.

I agree with you not getting in the dns.* stuff as mentioned above, but the section could be present in the JSON event, with an ellipsis.

So something like this:

{
  "event": {
    "category": [
      "network"
    ],
    "type": [
      "connection",
      "protocol"
    ],
    "kind": "event"
  },
  "network": {
    "protocol": "dns",
    "type": "ipv4",
    "transport": "udp"
  },
  "source": {
    "ip": "192.168.86.222",
    "port": 54162
  },
  "destination": { full section as above },
  "client": { full section as above },
  "server": { full section as above },
  "dns": { ... },   <= actual ellipsis, we're not getting into DNS
  "related": { "ip": [ "192.168.86.222", "192.168.86.1", "93.184.216.34" ] },
  "zeek": { "ts":1599775747.53056, ... } <= original fields can optionally be kept around as custom fields

Comment on lines 96 to 97
* `source.ip:192.168.86.222` returns all events sourced from `192.168.86.222`, regardless its role in a session
* `client.ip:192.168.86.222` returns all events with host `192.168.86.222` acting as a client.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is useful. Not sure if it gets all the way to demonstrate the subtleties, though.

It's good we're using DNS for this, because it lets us get into subtleties between client/server vs source/destination.

Here's looking at DNS traffic with source/destination vs looking at it with client/server.

source/destination

source.ip destination.ip event
192.168.86.222 192.168.86.1 DNS question 1
192.168.86.1 192.168.86.222 DNS answer 1
192.168.86.42 192.168.86.1 DNS question 2
192.168.86.1 192.168.86.42 DNS answer 2

client/server

client.ip server.ip event
192.168.86.222 192.168.86.1 DNS question 1
192.168.86.222 192.168.86.1 DNS answer 1
192.168.86.42 192.168.86.1 DNS question 2
192.168.86.42 192.168.86.1 DNS answer 2

The latter is more helpful, when the client & server roles flip around on query and response events.

But it also implies that in these cases the "copy source and destination" values needs to adjust where stuff is copied, depending on the event:

DNS question: source.ip => client.ip, destination.ip => server.ip
DNS answer: source.ip => server.ip, destination.ip => client.ip

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured out where I'm getting bogged down...

I strongly dislike mixing unidirectional and bidirectional data in one bucket. I know it completely skews ML analysis (unidirectional features don't work in bidirectional, and vice versa), and I think it plays havoc with things like the network tab on the SIEM if filtering / bidirectional normalization isn't being done.

How to deal with that I'm not really sure...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@webmat yes it's good idea to call out that while source/destination will flip, client/server assignments will apply throughout the transaction.

@dainperkins I don't think we're introducing any new concepts or guidance on unidirectional vs. bidirectional data here, but trying to better state what the existing guidance already is in a write-up that can live in the ECS docs.

Are these issues something we can improve through better ECS guidance here? Or would that topic be better as a separate discussion?

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks for the adjustments. This is looking great!

@ebeahan ebeahan merged commit 7472d90 into elastic:master Sep 29, 2020
@ebeahan ebeahan deleted the network-related-event-doc branch September 29, 2020 16:07
ebeahan added a commit to ebeahan/ecs that referenced this pull request Sep 29, 2020
dseeley added a commit to dseeley/ecs that referenced this pull request May 5, 2021
* bumping version for 1.x release branch (elastic#921)

* [1.x] add related.hosts (elastic#913) (elastic#924)

* [1.x][DOCS] Fixes SIEM links (elastic#936)

* [1.x] Consolidate field-details doc template (elastic#897) (elastic#946)

* Add http.[request|response].mime_type (elastic#944) (elastic#949)

* [1.x] Cut 1.6 Changelog (elastic#933) (elastic#952) (elastic#953)

Co-authored-by: Mathieu Martin <mathieu.martin@elastic.co>

* [1.x] Add threat.technique.subtechnique (elastic#951) (elastic#956)

Co-authored-by: Ross Wolf <31489089+rw-access@users.noreply.github.com>

* [1.x] Nest as for foreign reuse (elastic#960) (elastic#962)

* [1.x] Remove `expected_event_types` from protocol (elastic#964) (elastic#965)

* [1.x] Expand definitions of source and destination field sets (elastic#967) (elastic#973)

* [1.x] Introduce `--strict` flag (elastic#937) (elastic#975)

* [1.x] Add example value composite type checking (elastic#966) (elastic#976)

* Add example value composite type checking (elastic#966)
* generate csv artifact

* [1.x] Add event category configuration (elastic#963) (elastic#977)

* [1.x] Add normalizer multi-field capability (elastic#971) (elastic#978)

Co-authored-by: Eric Beahan <ebeahan@gmail.com>

Co-authored-by: Madison Caldwell <madison.rey.caldwell@gmail.com>

* [1.x] Add mapping network event guidance doc (elastic#969) (elastic#983)

* [1.x] Removing unneeded link under `Additional Information` (elastic#984) (elastic#985)

* [1.x] Add discrete attribute to field details page headers (elastic#989) (elastic#990)

* [1.x] Uniformity across domain name breakdown fields (elastic#981) (elastic#994)

Co-authored-by: Mathieu Martin <webmat@gmail.com>

* Add --oss flag to the ECS generator script (elastic#991) (elastic#995)

* Add network directions ingress and egress (elastic#945) (elastic#997)

* Mention ECS Mapper in the main documentation (elastic#987) (elastic#1000)

Co-authored-by: Dan Roscigno <dan@roscigno.com>

* [1.x] Introduce experimental artifacts (elastic#993) (elastic#1001)

Co-authored-by: Mathieu Martin <webmat@gmail.com>

* Bump version to 1.8.0-dev in branch 1.x (elastic#1011)

* Cut 1.7 changelog (elastic#1010) (elastic#1012)

* [1.x] Clarify that file extension should exclude the dot. (elastic#1016) (elastic#1020)

* [1.x] Add usage docs section (elastic#988) (elastic#1024)

Co-authored-by: Mathieu Martin <mathieu.martin@elastic.co>

* [1.x] feat: include alias path when generating template (elastic#877) (elastic#1035)

Co-authored-by: Richard Gomez <32133502+rgmz@users.noreply.github.com>

* [1.x] Add support for `scaling_factor` in the generator (elastic#1042) (elastic#1055)

Co-authored-by: Mathieu Martin <mathieu.martin@elastic.co>

* [1.x] Add fallback for constant_keyword (elastic#1046) (elastic#1056)

Co-authored-by: Mathieu Martin <mathieu.martin@elastic.co>

* [1.x] Add wildcard type support to go code generator (elastic#1050) (elastic#1057)

* add wildcard type support

* also add version and constant_keyword

* changelog

* [1.x] New default make task that generates main and experimental artifacts. (elastic#1041) (elastic#1060)

Also changing the order of the 'generate' task: it now starts with the new generator, then runs the legacy scripts.

* [1.x] Change the index pattern in the sample template. (elastic#1048) (elastic#1068)

* [1.x] Prepare link to Logs docs changing with the 7.10 release in "getting-started" (elastic#1073) (elastic#1079)

Co-authored-by: EamonnTP <Eamonn.Smith@elastic.co>

* [1.x] Prepare link to Logs docs changing with the 7.10 release in "products-solutions" page (elastic#1074) (elastic#1083)

Co-authored-by: EamonnTP <Eamonn.Smith@elastic.co>

* [1.x] Add event.category session. (elastic#1049) (elastic#1093)

Co-authored-by: Mathieu Martin <mathieu.martin@elastic.co>

* [1.x] Add event.category registry (elastic#1040) (elastic#1094)

Co-authored-by: Mathieu Martin <mathieu.martin@elastic.co>

* [1.x] Add --ref support for experimental artifacts (elastic#1063) (elastic#1101)

Co-authored-by: Mathieu Martin <webmat@gmail.com>

* [1.x] Remove experimental event.original definition (elastic#1053) (elastic#1104)

* [1.x] Add missing `process.thread.name` to experimental definitions (elastic#1103) (elastic#1106)

* [1.x] Remove index parameter for wildcard fields (elastic#1115) (elastic#1119)

* [1.x] Add dns.answer object into experimental schema (elastic#1118) (elastic#1121)

* [1.x] Clarify x509 definition guidance for network events with only one cert (elastic#1114) (elastic#1123)

* [1.x] Indicate when artifacts include experimental changes (elastic#1117) (elastic#1125)

* [1.x] Add os.type field, with list of allowed values (elastic#1111) (elastic#1130)

* [1.x] Add support for constant_keyword's 'value' parameter (elastic#1112) (elastic#1132)

* [1.x] Beta label support (elastic#1051) (elastic#1133)

Co-authored-by: Mathieu Martin <webmat@gmail.com>

* [1.x] Backport elastic#1134 and elastic#1135 (elastic#1136)

* Remove temporary ifeval in "getting started" page, add link to Metrics docs (elastic#1134)
* Remove temporary ifeval from products page, add link to Metrics (elastic#1135)

* Two small documentation backports (elastic#1149)

* Remove an incorrect `event.type` from the 'converting' page (elastic#1146)
* Mention Logstash support for ECS in the 'products' page (elastic#1147)

* [1.x] Reinforce the exclusion of the leading dot from url.extension (elastic#1151) (elastic#1152)

* [1.x] Make all fields linkable directly via an HTML ID (elastic#1148) (elastic#1154)

* [1.x] Tracing fields should be at the root (elastic#1165)

* Add notice to the tracing field set, about not nesting field names. (elastic#1162)
* Tracing fields should be at top level in Beats artifact (elastic#1164)

* [1.x] Usage of brackets for a URL containing IPv6 address (elastic#1131) (elastic#1168)

* [1.x] 6.x index template data type fallback (elastic#1171) (elastic#1172)

* [1.x] Apply RFC 0007 stage 3 changes - multi-user (elastic#1066) (elastic#1175)

Conflict: deleted file rfcs/text/0007-multiple-users.md as RFCs are not backported to version branches.

* [1.x] Handle `error.stack_trace` case for ES 6.x template (elastic#1176) (elastic#1177)

* [1.x] Add composable index templates artifacts (elastic#1156) (elastic#1179)

* [1.x] Move _meta section back inside mappings, in legacy templates. (elastic#1186) (elastic#1187)

Backports the following commits to 1.x:

* Move _meta section back inside mappings, in legacy templates. (elastic#1186) 

This fixes an issue introduced by elastic#1156, discovered in elastic#1180. Composable templates support `_meta` at the template's root, but legacy templates don't. So we're just putting it back inside the mappings for legacy templates.

This also fixes missing updates to the component template, after the introduction of wildcard in elastic#1098.

* [1.x] Apply the RFC 0005 stage 2 (host metrics) changes in the experimental artifacts (elastic#1159) (elastic#1184)

Co-authored-by: Mathieu Martin <mathieu.martin@elastic.co>

* [1.x] Stage 3 changes for wildcard RFC 0001 (elastic#1098) (elastic#1183)

* [1.x] Conditional handling in es_template.template_settings (elastic#1191) (elastic#1192)

* [1.x] Artifacts docs page (elastic#1189) (elastic#1195)

* [1.x] Remove beta warning label from categorization fields docs (elastic#1067) (elastic#1196)

* [1.x] Correct wording of `event.reference` description (elastic#1181) (elastic#1197)

* Bump version to 1.9.0-dev in branch 1.x (elastic#1198)

* [1.x] Cut 1.8 FF changelog.next.md elastic#1199 (elastic#1201)

* Merge custom and core multi_fields arrays (elastic#982) (elastic#1213)

Co-authored-by: Jonathan Buttner <56361221+jonathan-buttner@users.noreply.github.com>

* [1.x] Stage 2 changes for RFC 0009 - data_stream fields (elastic#1215) (elastic#1222)

* [1.x] add http.request.id (elastic#1208) (elastic#1223)

Co-authored-by: Eric Beahan <eric.beahan@elastic.co>
Co-authored-by: Gil Raphaelli <gil@elastic.co>

* [1.x] add cloud.service.name (elastic#1204) (elastic#1224)

* add cloud.platform

* expand cloud.platform description

* move to cloud.service.name

Co-authored-by: Gil Raphaelli <gil@elastic.co>

* [1.x] Add ssdeep hash (elastic#1169) (elastic#1227)

Co-authored-by: Andrew Stucki <andrew.stucki@elastic.co>

* [CI] Switch to GitHub actions (elastic#1236) (elastic#1245)

Co-authored-by: Eric Beahan <ebeahan@gmail.com>

Co-authored-by: Andrew Stucki <andrew.stucki@elastic.co>

* Revert wildcard adoption back to experimental stage (elastic#1235) (elastic#1243)

* Add scaled_float type to go generator (elastic#1250) (elastic#1251)

* add scaled_float

* changelog

* Add categorization fields usage docs (elastic#1242) (elastic#1257)

* add time_zone, postal_code, and continent_code (elastic#1229) (elastic#1258)

* Specify MAC address format (elastic#456) (elastic#1260)

Co-authored-by: Robin Schneider <36660054+ypid-geberit@users.noreply.github.com>

* finalize 1.8.0 changelog (elastic#1262) (elastic#1265)

* Add additional host fields (elastic#1248) (elastic#1267)

Co-authored-by: kaiyan-sheng <kaiyan.sheng@elastic.co>

* Stage 1 changes for RFC 0014 - extend pe fields (elastic#1256) (elastic#1270)

* Add 2 fields to code_signature (elastic#1269) (elastic#1272)

Co-authored-by: Yamin Tian <56367679+Trinity2019@users.noreply.github.com>

* Stage 3 changes for RFC 0007 - remove beta attribute (elastic#1271) (elastic#1273)

* Stage 1 experimental changes for RFC 0008 - threat.indicator fields (elastic#1268) (elastic#1274)

* Stage 1 changes for RFC 0015 - add elf fieldset (elastic#1261) (elastic#1275)

* Cut 1.9 FF CHANGELOG.next.md (elastic#1277)

* lock go version in actions (elastic#1283) (elastic#1290)

* Bump jinja2 from 2.11.2 to 2.11.3 in /scripts (elastic#1310) (elastic#1320)

* Bump jinja2 from 2.11.2 to 2.11.3 in /scripts

* Bump pyyaml from 5.3b1 to 5.4 in /scripts (elastic#1318) (elastic#1325)

Co-authored-by: Eric Beahan <eric.beahan@elastic.co>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Adjust terminology - change whitelist to allowlist (elastic#1315) (elastic#1331)

Co-authored-by: Dominic Page <11043991+djptek@users.noreply.github.com>

* Remove -dev label from 1.9 version (elastic#1329)

* remove -dev label from 1.9 version

* generate artifacts

* removing rules artifacts

* Cut 1.9 changelog (elastic#1328)

* move 1.9 changes to changelog

* add 1.9 release changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants