Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka-Minion as alternative to Burrow for consumer lag monitoring #259

Merged
merged 17 commits into from
May 6, 2019
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions consumers-prometheus/kafka-minion-service.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
apiVersion: v1
kind: Service
metadata:
name: metrics-minion
namespace: kafka
labels: &labels
app: kafka-minion
type: openmetrics
spec:
selector: *labels
ports:
- name: http
port: 8080
51 changes: 51 additions & 0 deletions consumers-prometheus/kafka-minion.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-minion
namespace: kafka
labels: &labels
app: kafka-minion
type: openmetrics
spec:
replicas: 1
selector:
matchLabels: *labels
template:
metadata:
labels: *labels
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: /metrics
spec:
containers:
- name: kafka-minion
image: quay.io/google-cloud-tools/kafka-minion:v0.1.2@sha256:756faaa4b7ce39b9f7d76c0cf9570ab0cf6a9c921e407acd0f12ca933abe202e
env:
- name: TELEMETRY_HOST
value: 0.0.0.0
- name: TELEMETRY_PORT
value: "8080"
- name: EXPORTER_IGNORE_SYSTEM_TOPICS
value: "true"
- name: EXPORTER_METRICS_PREFIX
value: kafka_minion
- name: LOG_LEVEL
value: info
- name: KAFKA_BROKERS
value: kafka-0.broker:9092, kafka-1.broker:9092, kafka-2.broker:9092
- name: KAFKA_CONSUMER_OFFSETS_TOPIC_NAME
value: __consumer_offsets
ports:
- name: http
containerPort: 8080
readinessProbe:
failureThreshold: 1
httpGet:
port: http
path: /metrics
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kafka Minion 1.1.2 introduces a dedicated readiness check which is 200 once Kafka Minion has initially consumed the __consumer_offsets topic which is the point in time when it starts exposing metrics. This is a required feature to run Kafka Minion in high availability / multiple replicas. This is recommended if you intend to setup alerting on these metrics.

Since this can take some time it requires some loose timeouts:

          readinessProbe:
            initialDelaySeconds: 10
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 60 # 60 * 10s equals 10min, should be adapted depending on the given resources and size of consumer offsets topic
            httpGet:
              path: /readycheck
              port: http

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5fc33f4 swiches to this endpoints but keeps everything else default

livenessProbe:
failureThreshold: 3
httpGet:
port: http
path: /metrics
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a separate endpoint which checks if it's still connected to at least one kafka broker:

          livenessProbe:
            initialDelaySeconds: 10
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 3
            httpGet:
              path: /healthcheck
              port: http

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5fc33f4 swiches to this endpoints but keeps everything else default

3 changes: 3 additions & 0 deletions consumers-prometheus/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
resources:
- kafka-minion-service.yaml
- kafka-minion.yaml
2 changes: 1 addition & 1 deletion prometheus/50-kafka-jmx-exporter-patch.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# meant to be applied using
# meant to be applied using kustomize, or with pre-1.14 kubectl:
# kubectl --namespace kafka patch statefulset kafka --patch "$(cat prometheus/50-kafka-jmx-exporter-patch.yml )"
apiVersion: apps/v1
kind: StatefulSet
Expand Down
21 changes: 19 additions & 2 deletions prometheus/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,27 @@
# Export metrics to Prometheus

Kafka uses JMX to expose metrics, as is already [enabled](https://github.com/Yolean/kubernetes-kafka/pull/96) for broker pods. There's many ways to use JMX. For example [Kafka Manager](../yahoo-kafka-manager/) uses it to display current broker traffic.
JMX is already [enabled](https://github.com/Yolean/kubernetes-kafka/pull/96) for broker pods (TODO extract to kustomization). There's many ways to use JMX. For example [Kafka Manager](../yahoo-kafka-manager/) uses it to display current broker traffic.

At Yolean we use Prometheus. This folder adds a sidecar to the broker pods that exports selected JMX metrics over HTTP in Prometheus format. To add a container to an existing pod we must use the `patch`command:
This folder adds a sidecar to the broker pods that exports selected JMX metrics over HTTP in Prometheus format. To add a container to an existing pod we must use the `patch`command:

Using kubectl 1.14+

```
kubectl --namespace kafka apply -k prometheus/
```

Using pre-1.14 kubectl:

```
kubectl --namespace kafka apply -f prometheus/10-metrics-config.yml
kubectl --namespace kafka patch statefulset kafka --patch "$(cat prometheus/50-kafka-jmx-exporter-patch.yml )"
```

## Consumer lag monitoring

See [Burrow](../linkedin-burrow)
or [Kafka Minion](../consumers-prometheus/)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some additional comments what one may prefer depending on the use case / environment?

  • Many kafka clusters to monitor with just one Exporter? => Burrow
  • Only interested in Consumer Health check? => Burrow
  • Want metrics in prometheus? => Kafka Minion
  • Looking for HA support? => Kafka Minion
  • Using versioning in group ids (e. g. consumer group name "email-sender-5" where 5 indicates the version) ? => Kafka Minion

In fact they can supplement each other and it may be a valid desire to operate both of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's lots and lots of research to be done for anyone who wants to set up a Kafka stack and I see this repository as a collection of examples rather than a way to discuss the choices.


## Prometheus Operator

Use the [prometheus-operator](../variants/prometheus-operator/) kustomization.
8 changes: 8 additions & 0 deletions prometheus/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
bases:
- ../kafka
# or any variant with kafka included, such as
#- ../variants/scale-1
resources:
- 10-metrics-config.yml
patchesStrategicMerge:
- 50-kafka-jmx-exporter-patch.yml
32 changes: 32 additions & 0 deletions variants/prometheus-operator/k8s-kafka-rbac.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Allows the "k8s" prometheus from Prometheus Operator contrib to do service discovery iin the kafka namespace
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: prometheus-k8s
namespace: kafka
rules:
- apiGroups:
- ""
resources:
- services
- endpoints
- pods
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: prometheus-k8s
namespace: kafka
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: prometheus-k8s
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: monitoring
38 changes: 38 additions & 0 deletions variants/prometheus-operator/k8s-kafka-servicemonitor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
apiVersion: v1
kind: Service
metadata:
name: broker-monitoring
namespace: kafka
labels:
app: kafka
spec:
publishNotReadyAddresses: true
ports:
- name: fromjmx
port: 5556
selector:
app: kafka
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kafka
namespace: monitoring
labels:
k8s-app: kafka
spec:
namespaceSelector:
matchNames:
- kafka
selector:
matchLabels:
app: kafka
endpoints:
# https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#endpoint
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 120s
scrapeTimeout: 119s
port: fromjmx
scheme: http
path: /metrics
22 changes: 22 additions & 0 deletions variants/prometheus-operator/k8s-minion-servicemonitor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kafka-metrics-minion
namespace: monitoring
labels:
k8s-app: kafka-metrics-minion
spec:
namespaceSelector:
matchNames:
- kafka
selector:
matchLabels:
app: kafka-minion
type: openmetrics
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 30s
scrapeTimeout: 30s
port: http
scheme: http
path: /metrics
9 changes: 9 additions & 0 deletions variants/prometheus-operator/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
bases:
- ../../prometheus
- ../../consumers-prometheus
resources:
- k8s-kafka-rbac.yaml
# with base ../../prometheus
- k8s-kafka-servicemonitor.yaml
# with base ../../consumers-prometheus
- k8s-minion-servicemonitor.yaml