Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a HAProxy filebeat module. #8014

Merged
merged 15 commits into from
Sep 4, 2018

Conversation

Tethik
Copy link
Contributor

@Tethik Tethik commented Aug 18, 2018

I created a new module for parsing HAProxy http logs via filebeat. Originally developed on version 6.3 of elasticsearch/filebeat.

Feedback would be appreciated, please let me know if there's anything missing.

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work here, thanks for this! I have added some comments.

@@ -0,0 +1,8 @@
- module: haproxy
# All logs
{fileset}:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be replaced by the name of the fileset, I guess http in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


--

*`haproxy.http.status_code`*::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you follow ECS in fields naming if possible? for example http response fields could be at the root level under http.response....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.


type: long
- name: srvconn
description: srv_conn is the total number of concurrent connections still active on the server when the session was logged.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These field names are a bit cryptic, maybe we could have a connections object with different counts, so these fields are named connections.active, connections.frontend, connections.backend, connections.server, wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied the names of the fields directly from the HAProxy documentation
https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#8.2.3

I don't mind changing them though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you move them up from the fileset though? e.g. haproxy.connections.* as opposed to haproxy.http.connections.*

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could leave them directly on haproxy.connections, so if at some moment we have also a TCP fileset they can be aggregated more easily.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. What about the rest of the fields that are shared between the TCP and HTTP logs? e.g. client_ip, frontend_name or the different queue times? Should we put these directly on the parent (as haproxy.client_ip, haproxy.frontend_name) or try to group them more?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I also think it would also make sense to move these fields to the module level.


# Set custom paths for the log files. If left empty,
# Filebeat will choose the paths depending on your OS.
#var.paths:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add here also a reference to var.input

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

include::../include/configuring-intro.asciidoc[]

The module is by default configured to run via syslog on port 9001. However
it can also be configured to read from a file path. See the following example.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition to be able to use syslog and file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I started out with a normal file log but setting it up in HAProxy requires a bit of effort. Going via syslog was easier as HAProxy has that support built in. I intend to document the HAProxy configuration necessary for using the module too.

=== Example dashboard

This module comes with a sample dashboard showing geolocation, distribution of requests between backends and frontends,
and status codes over time. For example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the dashboard! 😃



[float]
==== `{fileset}` log fileset settings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace {fileset} also here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Tethik
Copy link
Contributor Author

Tethik commented Aug 20, 2018

Pushed some changes now, but I still need to update the dashboard and test the changes. Hope I can do so later tonight.

@@ -0,0 +1,61 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should contain only the list of events (each event being the value of _source)

"lon": 13.6333,
"lat": 52.3167
}
},
Copy link
Member

@jsoriano jsoriano Aug 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be careful with exposing personal data in examples/tests 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor slip up, but not really sensitive. I'll see if I can scrub it.

@@ -0,0 +1,8 @@
- module: haproxy
# All logs
{fileset}:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http also here

- name: syslog_port
default: 9001
- name: input
default: syslog
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being syslog the default is going to be tricky for the test_modules test as it instantiates filebeat with the default configurations.
@kvch @ph is there a way to provide a different config than the default for the test_modules?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filebeat/tests/system/config/filebeat.yml.j2 is the template used during running system tests. This needs to be extended with syslog input.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current system test framework also needs to be extended to be able to send logs over TCP and UDP to Filebeat. Maybe running netcat with proper config does the trick of feeding Filebeat with input.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll take a look to this.

@Tethik
Copy link
Contributor Author

Tethik commented Aug 23, 2018

Still working on the dashboard fyi, quite busy these coming days though so it might take a while.

#------------------------------- haproxy Module ------------------------------
- module: haproxy
# All logs
{fileset}:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another placeholder here, remember to run also make update after modifying these files.

@Tethik
Copy link
Contributor Author

Tethik commented Aug 27, 2018

Dashboard remade and tests scrubbed.

Here's also a sample haproxy config that should work to send logs to this module.
https://gist.github.com/Tethik/b458036e734f5ec78bb362bba53e9d1c

@jsoriano
Copy link
Member

@Tethik thanks for your work on this, I'm going to take a look on how we can extend our tests for syslog-based modules, I'll keep you updated.

@Tethik
Copy link
Contributor Author

Tethik commented Aug 29, 2018

Glad to be able to contribute. I hope to add some more modules in the near future too 🎁

@jsoriano
Copy link
Member

@Tethik we have merged #8140, that will force the use of file input for tests, please update your branch with master to check your test files.

@Tethik
Copy link
Contributor Author

Tethik commented Aug 31, 2018

Done, I rebased to the latest master. Not sure how I'm supposed to run the tests though.

@jsoriano
Copy link
Member

jsoriano commented Sep 3, 2018

To run the tests locally, you can run INTEGRATION_TESTS=1 make system-tests from the filebeat directory.
This takes a while as it runs all system tests, if you want to run only the tests with the haproxy log files, you will need first a running elasticsearch, you can run it with make start-environment, and then:

make python-env
source build/python-env/bin/activate
make filebeat.test
INTEGRATION_TESTS=1 TEST_MODULES=haproxy BEAT_STRICT_PERMS=false ES_HOST=<easticsearch ip> nosetest tests/system/test_module.py

You can check the logs for more info in build/system-tests.

@@ -0,0 +1 @@
Jul 30 09:03:52 localhost haproxy[32450]: 1.2.3.4:38862 [30/Jul/2018:09:03:52.726] incoming~ docs_microservice/docs 0/0/1/0/2 304 168 - - ---- 6/6/0/0/0 0/0 {docs.example.internal||} {|||} "GET /component---src-pages-index-js-4b15624544f97cf0bb8f.js HTTP/1.1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file may need a new line at the end to ensure that this is read on tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed the newline. No change though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a change 🙂 now the test build failed due to expected object not found. I have added a comment with the missing fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, been trying to run the tests locally but the environment is not cooperating with me. (I get different failures from the other unrelated tests too)

"haproxy.connections.frontend": 6,
"@timestamp": "2018-07-30T09:03:52.726Z",
"message": "Jul 30 09:03:52 localhost haproxy[32450]: 1.2.3.4:38862 [30/Jul/2018:09:03:52.726] incoming~ docs_microservice/docs 0/0/1/0/2 304 168 - - ---- 6/6/0/0/0 0/0 {docs.example.internal||} {|||} \"GET /component---src-pages-index-js-4b15624544f97cf0bb8f.js HTTP/1.1\""
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fields are missing:

        "input.type": "log",
        "prospector.type": "log",
        "fileset.module": "haproxy",
        "fileset.name": "http",
        "haproxy.geoip.region_iso_code": "US-WA",
        "offset": 0,

@jsoriano
Copy link
Member

jsoriano commented Sep 3, 2018

jenkins, test this please

@jsoriano
Copy link
Member

jsoriano commented Sep 3, 2018

jenkins, test this again please

@jsoriano jsoriano merged commit e5ed867 into elastic:master Sep 4, 2018
jsoriano added a commit to jsoriano/beats that referenced this pull request Sep 4, 2018
Change was introduced in elastic#8014, it broke docs build
@jsoriano
Copy link
Member

jsoriano commented Sep 4, 2018

To be backported to 6.x along with #8215

@exekias exekias added the needs_backport PR is waiting to be backported to other branches. label Sep 4, 2018
jsoriano added a commit that referenced this pull request Sep 4, 2018
Change was introduced in #8014, it broke docs build
jsoriano pushed a commit to jsoriano/beats that referenced this pull request Sep 4, 2018
jsoriano added a commit to jsoriano/beats that referenced this pull request Sep 4, 2018
Change was introduced in elastic#8014, it broke docs build

(cherry picked from commit e7631ff)
@jsoriano jsoriano added v6.5.0 and removed needs_backport PR is waiting to be backported to other branches. labels Sep 4, 2018
jsoriano pushed a commit to jsoriano/beats that referenced this pull request Sep 4, 2018
jsoriano added a commit to jsoriano/beats that referenced this pull request Sep 4, 2018
Change was introduced in elastic#8014, it broke docs build

(cherry picked from commit e7631ff)
jsoriano added a commit that referenced this pull request Sep 6, 2018
* Add a HAProxy filebeat module. (#8014)

(cherry picked from commits e5ed867,
e7631ff and 
dc94a44)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants