Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pod-uid support for add_kubernetes_metadata #7072

Conversation

mariomechoulam
Copy link
Contributor

@mariomechoulam mariomechoulam commented May 10, 2018

Hello everyone!

I am opening this pull request to gather some feedback on the approach.
There are a few things that can be cleaned up and improved for sure, as this is my first try with Go, but the basic should be working.
I was hesitating whether to create a completely new Matcher and duplicate a lot of the existing logic or extend the current one. Extending did not feel entirely right, because the LogPathMatcher was focused on container ID - nevertheless it is still a log_path matcher. Thus this middle-ground approach in which I am adding a new configuration parameter for the Matcher; it should be seamlessly backwards compatible.

I believe this is the configuration that should make it work...

processors:
  - add_kubernetes_metadata:
      in_cluster: true
      default_indexers.enabled: false
      default_matchers.enabled: false
      include_pod_uid: true
      indexers:
        - pod_uid: ~
      matchers:
        - logs_path:
            logs_path: /var/lib/kubelet/pods/
            resource_type: pod

My understanding is that, ideally, an integration test of some sort would be useful - at least to test that the above configuration will indeed produce the desired functionality. Will welcome any guidance :)

Cheers!

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

func TestPodUidIndexer(t *testing.T) {
var testConfig = common.NewConfig()

podUidIndexer, err := NewPodUidIndexer(*testConfig, metagen)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var podUidIndexer should be podUIDIndexer

}
}

func (p *PodUidIndexer) GetIndexes(pod *kubernetes.Pod) []string {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method PodUidIndexer.GetIndexes should have comment or be unexported

return &PodUidIndexer{metaGen: metaGen}, nil
}

func (p *PodUidIndexer) GetMetadata(pod *kubernetes.Pod) []MetadataIndex {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method PodUidIndexer.GetMetadata should have comment or be unexported

metaGen kubernetes.MetaGenerator
}

func NewPodUidIndexer(_ common.Config, metaGen kubernetes.MetaGenerator) (Indexer, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported function NewPodUidIndexer should have comment or be unexported
func NewPodUidIndexer should be NewPodUIDIndexer

@@ -128,6 +129,28 @@ func (p *PodNameIndexer) GetIndexes(pod *kubernetes.Pod) []string {
return []string{fmt.Sprintf("%s/%s", pod.Metadata.Namespace, pod.Metadata.Name)}
}

type PodUidIndexer struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported type PodUidIndexer should have comment or be unexported
type PodUidIndexer should be PodUIDIndexer

@@ -67,6 +70,21 @@ func (g *metaGenerator) PodMetadata(pod *Pod) common.MapStr {
return meta
}

func (g *metaGenerator) PodMetadataWithUid(pod *Pod) common.MapStr {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method PodMetadataWithUid should be PodMetadataWithUID

// In case of the Kubernetes log path "/var/log/containers/",
// the container ID will be located right before the ".log" extension.
if strings.HasPrefix(f.LogsPath, "/var/log/containers/") && strings.HasSuffix(source, ".log") && sourceLen >= containerIdLen+4 {
containerIdEnd := sourceLen - 4

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var containerIdEnd should be containerIDEnd

if strings.HasPrefix(f.LogsPath, "/var/lib/kubelet/pods/") && strings.HasSuffix(source, ".log") {
pathDirs := strings.Split(source, "/")
if len(pathDirs) > podUidPos {
podUid := strings.Split(source, "/")[podUidPos]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var podUid should be podUID

// Pod UID is a 36-character-long string
const podUidLen = 36
// Pod UID is on the 5th index of the path directories
const podUidPos = 5

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const podUidPos should be podUIDPos

}

// Docker container ID is a 64-character-long hexadecimal string
const containerIdLen = 64
// Pod UID is a 36-character-long string
const podUidLen = 36

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const podUidLen should be podUIDLen

@mariomechoulam
Copy link
Contributor Author

mariomechoulam commented May 11, 2018

I guess that the CI is failing because of the style and formatting. I have tried executing make fmt but I get an error. I am on Manjaro now, which ships with both Python 2 and 3, but the default is 3 =/
Have tried everything step by step to make sure that I have virtualenv with Python 2 and pip2 - and I have. Even manually source build/python-env/bin/activate-d before running the task. However every time I get:

make[1]: Entering directory '/home/mario/git/go/src/github.com/elastic/beats/libbeat'
Cache entry deserialization failed, entry ignored
  Cache entry deserialization failed, entry ignored
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-8n1zw1h_/functools32/
make[1]: *** [scripts/Makefile:232: python-env] Error 1
make[1]: Leaving directory '/home/mario/git/go/src/github.com/elastic/beats/libbeat'
make: *** [Makefile:92: fmt] Error 1

Probably those cache lines are a hint that it tries to run a different pip version. I am confident that executing sudo pip2 install --upgrade functools32 is working, not sure why it does not through make fmt.

Fetched goimports manually and after running it I am getting quite a lot of modifications, most in files I did not modify and several imports being removed altogether. Honestly, don't feel comfortable commiting that...
Can anyone shed some light and unblock me? I can send a patch with the goimports modifications.

@jsoriano jsoriano added in progress Pull request is currently in progress. review containers Related to containers use case Filebeat Filebeat libbeat labels May 11, 2018
Copy link
Contributor

@exekias exekias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking this!!

Overall this is looking good, what do you think about making pod.uid optional to the metadata generator (also add_kubernetes_metadata). You will need to add pod.id mapping to https://github.com/elastic/beats/blob/master/libbeat/processors/add_kubernetes_metadata/_meta/fields.yml

}

func newLogsPathMatcher(cfg common.Config) (add_kubernetes_metadata.Matcher, error) {
config := struct {
LogsPath string `config:"logs_path"`
ResourceType string `config:"resource_type"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this config starts to get complicated, I guess we can live with this how it is now, but I will try to give it a thought on how to simplify this mechanism. Not a blocker for this PR :)

@@ -67,6 +70,21 @@ func (g *metaGenerator) PodMetadata(pod *Pod) common.MapStr {
return meta
}

func (g *metaGenerator) PodMetadataWithUID(pod *Pod) common.MapStr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a setting instead, so pod.uid is added if requested (both by the user or your indexer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey!
Thanks for the feedback. So, just to make sure I am understand correctly, we are talking about adding the pod.uid from the configuration, anytime that

  • The pod_uid Indexer is used
  • pod.uid is specified as a field inside the fields element?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking more about doing something similar to the current include_annotations setting, for instance. Parameters are passed here: https://github.com/elastic/beats/blob/master/libbeat/common/kubernetes/metadata.go#L24

And the processor allows configuration for them here: https://github.com/elastic/beats/blob/master/libbeat/processors/add_kubernetes_metadata/config.go#L24

The indexer would pass the right parameter (enable pod.uid) without the user asking for it.

What do you think?

@exekias
Copy link
Contributor

exekias commented May 11, 2018

About the goimports issue, you can probably just use goimports -local github.com/elastic -l -w <file_to_review.go. Let me know if that works for you

@mariomechoulam
Copy link
Contributor Author

Hi, just refactored the code as discussed and updated the unit-tests (also the autodiscover feature albeit a bit late, apologies).
Looks like the formatting is working now, thanks @exekias !
Just ran make docs and looks like it did stuff, but right after that I am not seeing any changes on the branch to be commited. Am I doing anything wrong?

@exekias
Copy link
Contributor

exekias commented May 15, 2018

You will need to:

  • rebase master
  • run make update in the root beats folder

…the Pod.UID as the key and inside the metadata. Extended Filebeat's LogPathMatcher with an additional configuration parameter (resource_type) which, when set to 'pod', uses a different logic to extract the MetadataIndex from the source field. Updated unit-tests accordingly.
… PodUIDIndexer being specified or by the "include_pod_uid" options in the configuration file. The result is used as a parameter for the metaGenerator which, in turn, adds the pod.uid to the existing metadata structure. Added and updated unit-tests accordingly.
@mariomechoulam mariomechoulam force-pushed the improvement/add_kubernetes_metadata_supports_poduid branch from aa87349 to 1a0e10e Compare May 15, 2018 12:49
@@ -65,7 +66,10 @@ func newKubernetesAnnotator(cfg *common.Config) (processors.Processor, error) {
Indexing.RUnlock()
}

metaGen := kubernetes.NewMetaGenerator(config.IncludeAnnotations, config.IncludeLabels, config.ExcludeLabels)
//Checks whether the PodUIDIndexer has been added in order to turn on the pod uid gathering
_, hasPodUIDIndexer := Indexing.indexers["pod_uid"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better if we don't do this automatically, the user must enable it. In any case, the indexer would still work without this, isn't it?

Copy link
Contributor Author

@mariomechoulam mariomechoulam May 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm the intention here was to enable the gathering of pod.uid if the indexer was specified in the config, even if the config bool include_pod_uid was not.
I thought this is what you meant with your previous comment: The indexer would pass the right parameter (enable pod.uid) without the user asking for it.
Or perhaps I am missing something...

EDIT:
Now that I check again, the Indexer would work even without any of the 2 being in the config as the key is not gotten from the metadata. Should I then remove it altogether?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

Yes, I apologize as that is what I said, then I realized the indexer doesn't need the pod uid in the metadata 😇

Still, I think having pod.uid is a good feature for some people, so I would keep it as something optional you can enable. I would only remove this auto-enable thingy and we are ready to go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!
I'll remove the auto-enable and add the info in the changelog first thing tomorrow.
Thanks man! And no need to apologise 🙂

Copy link
Contributor

@exekias exekias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes! I added a (hopefully) last thing :)

Please add a new entry to CHANGELOG.asciidoc

@exekias
Copy link
Contributor

exekias commented May 16, 2018

jenkins, test this please

@yangyuw
Copy link

yangyuw commented May 16, 2018

@mariomechoulam Can I use this with emptyDir like this now?

processors:
  - add_kubernetes_metadata:
      in_cluster: true
      default_indexers.enabled: false
      default_matchers.enabled: false
      indexers:
        - pod_uid
      matchers:
        - logs_path:
            logs_path: /var/lib/kubelet/pods/
            resource_type: pod

@exekias exekias removed the in progress Pull request is currently in progress. label May 16, 2018
@exekias exekias merged commit 232ee20 into elastic:master May 16, 2018
@exekias
Copy link
Contributor

exekias commented May 16, 2018

Thank you @mariomechoulam for your contribution 🎉

It will go out in 6.4 release

@mariomechoulam
Copy link
Contributor Author

Awesome! Many thanks @exekias for your support!
Do we know if it is working so I can update the initial post and the discuss topic? I was not able to test it beyond unit-test myself...

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@mariomechoulam
Copy link
Contributor Author

Hi @yu-yang2
I have updated the original post with the configuration, I believe, would make it work (notice I added the include_pod_uid: true option). Nevertheless, I would wait myself until the confirmation that this is, indeed, the working setup.

@exekias
Copy link
Contributor

exekias commented May 16, 2018

I can confirm it works, both the indexer and adding pod.uid

@yangyuw
Copy link

yangyuw commented May 16, 2018

@mariomechoulam You have missing a : after pod_uid beyond indexers, here is my config file, this is perfect work with emptyDir even in multi replicas deployment.

filebeat.prospectors:
- type: log
  paths:
    - /var/lib/kubelet/pods/*/volumes/kubernetes.io~empty-dir/*/*.log
processors:
  - add_kubernetes_metadata:
      in_cluster: true
      default_indexers.enabled: false
      default_matchers.enabled: false
      include_pod_uid: true
      indexers:
        - pod_uid:
      matchers:
        - logs_path:
            logs_path: /var/lib/kubelet/pods/
            resource_type: pod
output.elasticsearch:
  hosts: elasticsearch-logging:9200

@mariomechoulam
Copy link
Contributor Author

Nice spot @yu-yang2 ! And many thanks for adding the correct configuration here; I'll update the original post.
Very happy to see that you found this helpful and are using it 🙂
Cheers!

stevea78 pushed a commit to stevea78/beats that referenced this pull request May 20, 2018
Added PodUidIndexer to add_kubernetes_metadata processor, containing the Pod.UID as the key and inside the metadata.
Extended Filebeat's LogPathMatcher with an additional configuration parameter (resource_type) which, when set to 'pod', uses a different logic to extract the MetadataIndex from the source field.
stevea78 pushed a commit to stevea78/beats that referenced this pull request May 20, 2018
Added PodUidIndexer to add_kubernetes_metadata processor, containing the Pod.UID as the key and inside the metadata.
Extended Filebeat's LogPathMatcher with an additional configuration parameter (resource_type) which, when set to 'pod', uses a different logic to extract the MetadataIndex from the source field.
@yubobo
Copy link

yubobo commented Aug 18, 2018

@exekias
When is the 6.4 release? It's been over 100 days
thanks~

@yubobo
Copy link

yubobo commented Sep 28, 2018

unable to get kubernetes metadata to enrich log using filebeat 6.4.1. #8481
would you share deployment config with me? @mariomechoulam @yu-yang2
thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
containers Related to containers use case Filebeat Filebeat libbeat review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants