Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add_kubernetes_metadata processor: add support for "/var/log/containers/" log path #4981

Merged
merged 4 commits into from
Aug 25, 2017

Conversation

SvenWoltmann
Copy link
Contributor

Following up on this topic:
https://discuss.elastic.co/t/add-kubernetes-metadata-should-work-in-var-log-containers/97945

The Issue

The “add_kubernetes_metadata” processor should also work on the "/var/log/containers/" log path for the following reasons:

  1. You may want to exclude log files from certain pods, e.g. the filebeat pod itself with the exclude_files: ['filebeat-*.log'] option. That would work only in /var/log/containers, as only the symlinks there contain the pod name.

  2. You may want to read only the log files of docker containers used by active Kubernetes pods, not any other docker containers running on the system now or in the past. That also works only by following the symlinks in /var/log/containers.

  3. The “source” field in the log documents would be much more informative if it contained a value like /var/log/containers/kube-proxy-4d7nt_kube-system_kube-proxy-1bddb0001161285462528b7170a53d13dfe4e17b541319485b9020eef5433266.log
    instead of
    /var/lib/docker/containers/1bddb0001161285462528b7170a53d13dfe4e17b541319485b9020eef5433266/1bddb0001161285462528b7170a53d13dfe4e17b541319485b9020eef5433266-json.log

About the Pull Request

The pull requests contains a simple solution, in which the container ID is extracted in a different way when the logs_path matcher's logs_path setting is "/var/log/containers/".

So the processor needs to be configured like this:

processors:
- add_kubernetes_metadata:
    in_cluster: true
    default_matchers.enabled: false
    matchers:
    - logs_path:
        logs_path: /var/log/containers/

However, it seems that the logs_path setting is used only in the code that extracts the container ID from the source. Wouldn't it make more sense to extract the log path from the source variable? Then the logs_path setting can be removed and the processor can be used with it's default configuration:

processors:
- add_kubernetes_metadata:
    in_cluster: true

@elasticmachine
Copy link
Collaborator

Can one of the admins verify this patch?

@exekias
Copy link
Contributor

exekias commented Aug 23, 2017

ok to test

cid := ""
if strings.Contains(source, f.LogsPath) {
//Docker container is 64 chars in length
cid = source[len(f.LogsPath) : len(f.LogsPath)+64]
if f.LogsPath == "/var/log/containers/" && strings.HasSuffix(source, ".log") {
Copy link
Contributor

@exekias exekias Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of hardcoding this kind of things in the code, but I think I can live with this one if everyone else is ok with it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me neither, but the existing method already looked like a "quick-and-dirty" solution, so I just continued this way.

I'm now working on an alternative, more generic solution that ignores the "logs_path" parameter (it's nowhere documented and it doesn't seem to be used anywhere else) and instead parses the source to extract the container ID. Will have that ready in two hours or so. Maybe that's better than this solution.

@exekias
Copy link
Contributor

exekias commented Aug 23, 2017

Thank you very much for your contribution!! Could you please add a CHANGELOG.ascii entry?

@SvenWoltmann
Copy link
Contributor Author

I've implemented an alternative cleaner and more generic solution in #4995.

@exekias
Copy link
Contributor

exekias commented Aug 25, 2017

I still prefer this one, it covers all default cases and still allows to change the default in case Docker is not under /var/lib/docker, do you I mind if I merge this one and close the rest?

@SvenWoltmann
Copy link
Contributor Author

Sure, that's OK. But then it's also necessary to update the documentation at https://www.elastic.co/guide/en/beats/filebeat/master/add-kubernetes-metadata.html with something like this:

The default matcher detects container IDs only of files in /var/lib/docker/containers/. If you read files from /var/log/containers/ instead, you need to replace the default matcher by a logs_path matcher with the logs_path set to the appropriate path:

processors:
- add_kubernetes_metadata:
    in_cluster: true
    default_matchers.enabled: false
    matchers:
    - logs_path:
        logs_path: /var/log/containers/

Also, the matcher might still throw a "slice bound out of range" runtime error if source is shorter than expected. Would you like me to add an appropriate test and fix to this PR?

@exekias
Copy link
Contributor

exekias commented Aug 25, 2017

That would be awesome yes :), also please add an entry to CHANGELOG.asciidoc

… path

The add_kubernetes_metadata processor's LogPathMatcher could extract
a Docker container ID - and hence enrich a log document with Kubernetes
metadata - only if the log path was '/var/lib/docker/containers/'.

With this commit, the LogPathMatcher can extract the container ID also
from a '/var/log/containers/' log path (Kubernetes symlinks).
@SvenWoltmann
Copy link
Contributor Author

SvenWoltmann commented Aug 25, 2017

Done.

Now checking for the source length, refactored the tests and added several new tests - and a CHANGELOG.asciidoc was already there :)

Copy link
Contributor

@exekias exekias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some minor comments after latest changes

}
logp.Debug("kubernetes", "Using container id: %s", cid)
} else {
logp.Debug("kubernetes", "Error extracting container id - source value does not contain log path.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need to refactor this logic after adding length checks, in case of wrong length the error won't be logged at the moment

sourceLen := len(source)
logsPathLen := len(f.LogsPath)

if f.LogsPath == "/var/log/containers/" && strings.HasSuffix(source, ".log") && sourceLen >= containerIdLen + 4 {
Copy link
Contributor

@exekias exekias Aug 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strings.HasPrefix(f.LogsPath, "/var/log/containers") would be better here, as this is user input, it may not be exactly /var/log/containers/

@SvenWoltmann
Copy link
Contributor Author

Thanks for your comments. I've added another commit to address both issues.

I check if the path starts with /var/log/containers/ (with a tailing slash) instead of /var/log/containers, because if the user omits that tailing slash, it will automatically be added in line 43 (logPath = logPath + "/").

Also I duplicated these two lines, as it makes the code flow much easier to read:

logp.Debug("kubernetes", "Using container id: %s", cid)
return cid

@exekias
Copy link
Contributor

exekias commented Aug 25, 2017

Nice, please review code format (you can run make fmt), check is failing here: https://travis-ci.org/elastic/beats/jobs/268351019 and I think we are ready to go

@SvenWoltmann
Copy link
Contributor Author

Sorry... I didn't know about the fmt command, it's my first Go project.

One question though:

When formatted, the multi-line if statement looks like this:

if strings.HasPrefix(f.LogsPath, "/var/log/containers/") &&
	strings.HasSuffix(source, ".log") &&
	sourceLen >= containerIdLen+4 {
	containerIdEnd := sourceLen - 4
	...

So you can't easily recognize anymore where the if statement ends and the code block starts.

Would you prefer to write the if statement into a single (long) line like this?

if strings.HasPrefix(f.LogsPath, "/var/log/containers/") && strings.HasSuffix(source, ".log") && sourceLen >= containerIdLen+4 {
	containerIdEnd := sourceLen - 4
	...

@exekias
Copy link
Contributor

exekias commented Aug 25, 2017

I'm ok with the single line version too

@karmi
Copy link

karmi commented Aug 25, 2017

Hi @SvenWoltmann, we have found your signature in our records, but it seems like you have signed with a different e-mail than the one used in yout Git commit. Can you please add both of these e-mails into your Github profile (they can be hidden), so we can match your e-mails to your Github profile?

@SvenWoltmann
Copy link
Contributor Author

@exekias I've committed the formatted code.

@karmi I've added my other e-mail address to my profile. I guess I've used that in an earlier commit, but I just checked, and all my commits now use the e-mail, which I used for the signature.

@exekias exekias merged commit 4ac68d2 into elastic:master Aug 25, 2017
@exekias
Copy link
Contributor

exekias commented Aug 25, 2017

Thank you for adding this feature @SvenWoltmann!

@Julion
Copy link

Julion commented Oct 4, 2017

Hi Guys,

I hit this issue in 6.0.0-rc1. It seems that this change has not been applied to the 6.0 branch which I assume is the branch used for the release candidates (please correct me if I'm wrong). Although I understand why it would be considered a new feature, I would argue that since this the processor is beta and the change quite small it might be worth including. This is very confusing that /var/log/containers can't be a source in the current version (fluentd would be fine with it)

@exekias
Copy link
Contributor

exekias commented Oct 4, 2017

Hi @Julion, the change went in after the freeze deadline for 6.0, it wasn't backported as it's not a bugfix, should make it to the next release (6.1)

@jeremievallee
Copy link

Hi Guys,
@SvenWoltmann , thanks for this! We have the same use case and this look exactly like what we need. I'm having trouble around configuration though, and can't find what I need in the documentation.
Let's say you want filebeat to get the containers logs from Kubernetes, but you would like to exclude some files. I thought this prospector config would be right, but no luck so far:

  - type: docker
    containers:
      ids:
        - "*"
      path: "/var/log/containers"
    exclude_files:
      - "/var/log/containers/filebeat*.log"
      - "/var/log/containers/logstash*.log"
    processors:
      - add_kubernetes_metadata:
          in_cluster: true
          default_matchers.enabled: false
          matchers:
          - logs_path:
              logs_path: /var/log/containers/

Am I doing something wrong, or is it just not possible at the moment ?
Using filebeat:6.1.3, btw.

Thanks!

@exekias
Copy link
Contributor

exekias commented Jan 31, 2018

Hi @jeremievallee,

Please for questions use discuss: https://discuss.elastic.co/c/beats

@jeremievallee
Copy link

Hi @exekias, thanks for your message, I will do that now. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants