Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tag "multiline" to "log.flags" if event consists of multiple lines. #7997

Merged
merged 13 commits into from
Aug 31, 2018

Conversation

kvch
Copy link
Contributor

@kvch kvch commented Aug 17, 2018

Add "multiline" tag to "log.status" if the event contains multiple lines. This way users can filter for multiline messages using "multiline" in [log.status].

Example event

{
  "@timestamp": "2018-08-17T11:35:21.813Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "doc",
    "version": "7.0.0-alpha1"
  },
  "source": "/home/n/test.log",
  "offset": 0,
  "log": {
    "status": ["multiline"],
  },
  "message": "[test line\ntest line]",
  "prospector": {
    "type": "log"
  },
  "input": {
    "type": "log"
  },
  "beat": {
    "hostname": "sleipnir",
    "version": "7.0.0-alpha1",
    "name": "sleipnir"
  },
  "host": {
    "name": "sleipnir"
  }
}

Depends on #7991
Closes #957

@kvch kvch requested review from urso and ruflin August 17, 2018 11:39
@kvch kvch added review Filebeat Filebeat needs_backport PR is waiting to be backported to other branches. labels Aug 17, 2018
Copy link
Contributor

@andrewvc andrewvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Tested manually as well for good measure

Copy link
Contributor

@andrewvc andrewvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke too soon. Looks like there are some test failures:


07:04:43 ======================================================================
07:04:43 FAIL: Should be able to do multiline on docker logs.
07:04:43 ----------------------------------------------------------------------
07:04:43 Traceback (most recent call last):
07:04:43   File "/private/var/lib/jenkins/workspace/elastic+beats+pull-request+multijob-darwin/beat/filebeat/label/macosx/src/github.com/elastic/beats/filebeat/tests/system/test_json.py", line 95, in test_docker_logs_multiline
07:04:43     assert all("log" in o for o in output)
07:04:43 AssertionError
07:04:43 

"multiline": mlr.numLines,
},
})
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't report log.multiline: 1 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a multiline event contain only 1 line? Reason I'm asking is because I was thinking what if someone wants to see all mutiline events. So far I assumed it would be exists:log.multiline.

But your idea below kind of also answers this. If we call it log.lines we should probably add it to all events, not only mutiline events?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original request in the issue I am trying to close states "This is to mimic the behaviour the logstash multiline filter performs by adding 'multiline' to tags.". As logstash only tags events which contain more than one lines. See: https://www.elastic.co/guide/en/logstash/current/plugins-codecs-multiline.html#plugins-codecs-multiline-multiline_tag

@urso
Copy link

urso commented Aug 17, 2018

Personally I like log.lines better :)

@ruflin
Copy link
Member

ruflin commented Aug 20, 2018

+1 on the idea from @urso to call it log.lines. Reasoning on my end is that log.multiline I would expect to be a bool.

@webmat
Copy link
Contributor

webmat commented Aug 20, 2018

I also prefer log.lines. And to @ruflin's question about searching for multiline events, I think I'd rather have log.lines always be populated with either a 1 or greater. Then if someone's looking for multiline events, they search for log.lines:[2 TO *].

I personally prefer the consistency of always having the field present.

@andrewvc
Copy link
Contributor

Many people never touch multi-line events, so it's hard to imagine why we would start always setting this value. What's the use case? Every field we add increases the cognitive overhead of using the product.

The only use case for this value I've encountered is indicating the operation of the multiline option. If there are more use cases, that would be great to hear about. Without those however, keeping the naming multiline makes sense.

Using this value beyond the specific use of the multiline feature raises a number of questions. You could have a JSON input source creating 'multiline' values for the message field, yet this value would remain. What happens with a JSON value with no message field, but other fields with multiple newline characters?

If we really want to start counting the number of lines in every message, then an enumeration of the use cases for this new feature would be useful to this discussion.

WRT whether it should be only values >1 or >=1 I'm very slightly in favor of >=1. The reason being that since the main value here is in checking the operation of the multiline option. It will confirm that multiline was enabled, but only a single line was processed.

Ideally this field wouldn't even exist in the mapping unless a multiline filter was enabled. This is beyond the scope of this ticket, but is relevant to #7972

@urso
Copy link

urso commented Aug 20, 2018

Many people never touch multi-line events, so it's hard to imagine why we would start always setting this value. What's the use case? Every field we add increases the cognitive overhead of using the product.

The way the reader works, the count field would only be added if multiline is configured. No multiline, not extra-field.

@andrewvc
Copy link
Contributor

@urso Ah, my misunderstanding, I had thought we were considering expanding the scope. Glad to hear that's not on the table.

@kvch
Copy link
Contributor Author

kvch commented Aug 21, 2018

@ruflin For the record log.lines is my idea. :P :D
However, this does not seem like a viable name. In Docker JSON log format, the message is under the key log. Thus, if I change message_key to msg in the test, the messages are never parsed, because Docker log messages are under the key "log". (Example log from our codebase: "{"log":"1:M 09 Nov 13:27:36.276 # User requested shutdown...\n","stream":"stdout"}"). So the test fails.

I propose event.lines instead of log.lines. event is already part of the top level fields, so we would not introduce anything new there. WDYT?

@kvch kvch changed the title Add log.multiline if event consists of multiple lines Add event.multiline if event consists of multiple lines Aug 21, 2018
@kvch kvch changed the title Add event.multiline if event consists of multiple lines Add event.lines if event consists of multiple lines Aug 21, 2018
@ruflin
Copy link
Member

ruflin commented Aug 21, 2018

As for the docker case: The docker input "should" already solve the issue by renaming it to message. And if log is an object, I would expect that log.lines just gets added to it? I would definitively prefer to go with log.line as the log.* object will contain meta information about log events: https://github.com/elastic/ecs#log

@kvch Sorry about the log.lines, should read PR messages better :-(

@kvch
Copy link
Contributor Author

kvch commented Aug 21, 2018

I think I might be made a mistake with my previous explanation. Let me check again. But still my statement that if I change message_key option in that test, nothing is parsed.

@kvch
Copy link
Contributor Author

kvch commented Aug 21, 2018

No, my description is correct.

@ruflin
Copy link
Member

ruflin commented Aug 21, 2018

Did you use the docker input for this test? Because then I would expect a different result.

Which test are you referring to? In general I think that docker write to log is a problem and will cause other problems down the road and is why the docker input writes it to message instead (I hope).

@kvch kvch force-pushed the feature/filebeat/indicate-multiline branch from f75ccad to f806161 Compare August 21, 2018 12:57
@kvch kvch changed the title Add event.lines if event consists of multiple lines Add log.lines if event consists of multiple lines Aug 21, 2018
@kvch
Copy link
Contributor Author

kvch commented Aug 21, 2018

The failing test is not relevant anymore, because for Docker JSON logs, docker is the correct input to use, not log. I deleted that test and named the field log.lines.

I still have one more question. Do we want to mimic Logstash's behaviour or add log.lines all multiline messages?

@ruflin
Copy link
Member

ruflin commented Aug 22, 2018

Both options work for me.

@exekias For the docker prospector if we have this partial lines, should we also use the multiline count or is this a different kind of multiline?

@exekias
Copy link
Contributor

exekias commented Aug 22, 2018

Docker partials are just something internal to the format, it's still a single line in source and should be a single line in output. So would say it should not include this field in that case.

If you do common multiline on top it should contain this field, but I think that's covered by this change already.

@kvch kvch force-pushed the feature/filebeat/indicate-multiline branch from f806161 to 35b8e44 Compare August 27, 2018 18:48
@kvch kvch changed the title Add log.lines if event consists of multiple lines Add tag "multiline" to "log.status" if event consists of multiple lines. Aug 27, 2018
@kvch kvch force-pushed the feature/filebeat/indicate-multiline branch from 35b8e44 to d59613d Compare August 27, 2018 18:51
Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get the other PR and then rebase this on top which will be very little changes left like it seems.

@@ -108,6 +108,10 @@
description: >
Logging level.

- name: log.status
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment about naming in the other thread.

@@ -150,6 +151,25 @@ func TestMultilineBeforeNegateOKWithEmptyLine(t *testing.T) {
)
}

func TestMultilineAfterTruncated(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have a test for a multiline event that also gets truncated and then check for both flags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@kvch kvch force-pushed the feature/filebeat/indicate-multiline branch 2 times, most recently from d2189b2 to 1c48f06 Compare August 30, 2018 10:32
@kvch kvch changed the title Add tag "multiline" to "log.status" if event consists of multiple lines. Add tag "multiline" to "log.flags" if event consists of multiple lines. Aug 30, 2018
@kvch kvch force-pushed the feature/filebeat/indicate-multiline branch from 1c48f06 to 1df5455 Compare August 30, 2018 14:23
@kvch
Copy link
Contributor Author

kvch commented Aug 30, 2018

Rebased.

@kvch kvch merged commit 6da83e8 into elastic:master Aug 31, 2018
"""
Should be able to do multiline on docker logs.
"""
self.render_config_template(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kvch

  • I think this test is still valuable to make sure we support multiline logs in docker. What we should change is to set keys_under_root=False and then check for the fields accordingly
  • If we remove the test, we should also remove the test file as it's not used anywhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For docker we have a dedicated input type in Filebeat. Do users prefer to read Docker files using log and use json.* options? What is the "official" way read Docker logs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "official" was has become the docker input I would say. This doesn't mean users do not still use the old behaviour and I'm not even sure if we have such a multiline test for the docker input.

@kvch kvch added v6.5.0 and removed needs_backport PR is waiting to be backported to other branches. labels Sep 3, 2018
kvch added a commit to kvch/beats that referenced this pull request Sep 11, 2018
…s. (elastic#7997)

Add "multiline" tag to "log.status" if the event contains multiple lines. This way users can filter for multiline messages using "multiline" in [log.status].

Example event

{
  "@timestamp": "2018-08-17T11:35:21.813Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "doc",
    "version": "7.0.0-alpha1"
  },
  "source": "/home/n/test.log",
  "offset": 0,
  "log": {
    "status": [
        "multiline"
     ],
  },
  "message": "[test line\ntest line]",
  "prospector": {
    "type": "log"
  },
  "input": {
    "type": "log"
  },
  "beat": {
    "hostname": "sleipnir",
    "version": "7.0.0-alpha1",
    "name": "sleipnir"
  },
  "host": {
    "name": "sleipnir"
  }
}

Closes elastic#957

(cherry picked from commit 6da83e8)
kvch added a commit that referenced this pull request Sep 11, 2018
…s. (#7997) (#8207)

Add "multiline" tag to "log.status" if the event contains multiple lines. This way users can filter for multiline messages using "multiline" in [log.status].

Example event

{
  "@timestamp": "2018-08-17T11:35:21.813Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "doc",
    "version": "7.0.0-alpha1"
  },
  "source": "/home/n/test.log",
  "offset": 0,
  "log": {
    "status": [
        "multiline"
     ],
  },
  "message": "[test line\ntest line]",
  "prospector": {
    "type": "log"
  },
  "input": {
    "type": "log"
  },
  "beat": {
    "hostname": "sleipnir",
    "version": "7.0.0-alpha1",
    "name": "sleipnir"
  },
  "host": {
    "name": "sleipnir"
  }
}

Closes #957

(cherry picked from commit 6da83e8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants