Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add instrumentation for Kubernetes events #565

Open
pavolloffay opened this issue Nov 19, 2021 · 10 comments
Open

Add instrumentation for Kubernetes events #565

pavolloffay opened this issue Nov 19, 2021 · 10 comments
Labels
enhancement New feature or request

Comments

@pavolloffay
Copy link
Member

pavolloffay commented Nov 19, 2021

Since version 0.38.0 the operator supports injecting instrumentation to workloads via the Instrumentation.opentelemetry.io Kind. I would like to extend the CR to add support for generating spans for k8s events. In other words instrumentation for k8s events.

Kspan: https://github.com/weaveworks-experiments/kspan

example-2pod

The instrumentation kind could like like this:

type InstrumentationSpec {
  ///existing options
  KubernetesEvents KubernetesEvents
}

type KubernetesEvents {
  Enabled bool
}

cc) @bboreham

@pavolloffay
Copy link
Member Author

Previously we had a couple of discussions with @jpkrohling about what is the right place for kspan and I more and more think that having it in this operator makes sense.

@jpkrohling
Copy link
Member

@bboreham, I thought you'd be interested in knowing about this.

@pavolloffay
Copy link
Member Author

pavolloffay commented Dec 1, 2021

I have done some digging and I would like to share some details about kspan:

  • the project is created from kubebuilder
  • it is k8s controller for events
  • it holds events in memory for a specified time window in order to correlate events
  • the events are correlated by fields in an event e.g. involvedObject
  • RBAC is right now all objects, but we could restrict that

Problems:

  1. Because the events are held in memory there is a question about scalability. Multiple controllers can be deployed and shard events based on namespace name. It still might not be enough if a lot of events are in a single naspace.
  2. @bboreham mentioned that there might be a lot of events that are not interesting. I don't know what is the ideal solution to this, perhaps some filtering based on event source (controller name) or some other information from the events.
  3. k8s events are best effort, they might not be delivered

To make kspan fully reliable the events would have to incorporate some traceid/transactionId in the metadata.

Example events:
events.txt

Despite the issue, I like the functionality and we could work on improving it. The kspan would be added as a new CR kevents.opentelemetry.io. Note that I have renamed it to KEvents I find the name kspan a bit confusing.

apiVersion: opentelemetry.io/v1alpha1
kind: KEvents
metadata:
  name: kevents
spec:
  exporter:
    endpoint: http://otel-collector:4317
  image:
    image: pavolloffay/kspan:1
  resources:
    requests:
      memory: "64Mi"
      cpu: "250m"
    limits:
      memory: "128Mi"
      cpu: "500m"
  attributes:
    addK8sUIDAttributes: true
  resource:
    attributes:
      k8s.cluster.name: dev

This will deploy a single kevents deployment. Later once scalability is solved a replica count can be added to the CRD.

@secustor
Copy link
Member

secustor commented Dec 2, 2021

What would be interesting is to integrate KSpan in the k8clustereventreceiver to allow it to export additionally traces instead of only logs.
This would also reduce maintenance complexity as we wouldn't have to create a CRD for an external service.

@pavolloffay
Copy link
Member Author

That is an excellent point.

This would also reduce maintenance complexity as we wouldn't have to create a CRD for an external service.

On the other hand, the CR provides a better user experience. Right now the k8sclustereventreceiver has to be configured with additional k8s objects. We could put this logic to operator - e.g. the operator can recognize the receiver and create the RBAC on behalf of the user.

@secustor
Copy link
Member

secustor commented Dec 2, 2021

I'm not against creating a CRD, far from it.
What you are describing, is how I have imagined it.

This way the user can consume the events outside of the cluster too, e.g. managed k8 cluster provider which want to give consumers "cluster admin" permissions and therefore don't want to setup there tooling inside of the cluster. In case of the CRD this would simply deploy one or more collectors and the necessary rbac controls and point them internal or external OTLP endpoint.

@pavolloffay
Copy link
Member Author

Yes, we should definitely consider implementing this as a receiver. Even if we go the receiver route we can still create a specific CRD that will deploy an opinionated version of the collector, or have a field in the existing collector CR to create additional RBAC etc.

I have other news to share. I am talking to people to figure out how useful it is. So far I am getting positive feedback and it seems that people often correlate k8s events when an incident happens. This capability could be provided as a job that would get events from a given time window (incident window) or just simply run all the time in the cluster if preferred.

Discusson at k8s instrumentation SIG https://kubernetes.slack.com/archives/C20HH14P7/p1638464478071100. Video recording when kpan was presented there https://youtu.be/Uqyvt9kLnX8?t=1187

@pavolloffay
Copy link
Member Author

Another use-case: I have come across a user that could not store events in k8s with a long enough retention. Getting the events to another storage with correlation would solve the issue. I don't know the reasoning why the event store in k8s could not be configured to persist events for long period of time.

@pavolloffay
Copy link
Member Author

Crosslinking the issue from the collector-contrib open-telemetry/opentelemetry-collector-contrib#7408

@yuriolisa yuriolisa self-assigned this Jun 22, 2023
@yuriolisa yuriolisa removed their assignment Sep 13, 2023
@jaronoff97
Copy link
Contributor

@pavolloffay do we still want to look in to this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants