Practical Observability

A book that covers the application of observability practices.

We cover the topics of Metrics, Tracing, and Logging, as well as how common observability providers implement each of these ecosystems.

You will learn the theory behind different kinds of observability tools, how they are implemented from first principles, and how to instrument new and existing services in a few popular programming languages (Golang and Python).


This book is deployed as a draft in GCS for now:

Environment Book Link
GCS Practical Observability
GH Pages Practical Observability


To run this project locally, simply clone the repository and run make dev to bring up a mdbook container that exposes the docs at http://localhost:3000

You'll need a recent version of Docker in order to run the project.

Demo Stack Standup

To startup the local demo stack including Prometheus, Grafana, and AlertManager, use the following steps on your local Kubernetes cluster.

Install Helm

$ brew install helm

Install the Helm Prometheus Repo and Charts

$ helm repo add prometheus-community
$ helm repo add stable
$ helm repo update

Create a Monitoring Namespace for Prometheus components

$ kubectl create namespace monitoring

Run the Prometheus Helm Chart

Note the ports we're assigning to each service, these can be changed if necessary to accomodate your local environment.

$ helm install kind-prometheus prometheus-community/kube-prometheus-stack \
    --namespace monitoring \
    --set prometheus.service.nodePort=30000 \
    --set prometheus.service.type=NodePort \
    --set grafana.service.nodePort=31000 \
    --set grafana.service.type=NodePort \
    --set alertmanager.service.nodePort=32000 \
    --set alertmanager.service.type=NodePort \
    --set prometheus-node-exporter.service.nodePort=32001 \
    --set prometheus-node-exporter.service.type=NodePort

Patch the Node Exporter to keep it from crashing since it has difficulties with Docker Desktop clusters on occasion:

$ kubectl patch -n monitoring ds kind-prometheus-prometheus-node-exporter --type "json" -p '[{"op": "remove", "path" : "/spec/template/spec/containers/0/volumeMounts/2/mountPropagation"}]'

Check for running Prometheus pods:

$ kubectl --namespace monitoring get pods -l release=kind-prometheus
NAME                                                   READY   STATUS    RESTARTS   AGE
kind-prometheus-kube-prome-operator-75468846f9-ng4kk   1/1     Running   0          6m14s
kind-prometheus-kube-state-metrics-554c667875-mg27l    1/1     Running   0          6m14s
kind-prometheus-prometheus-node-exporter-l7qng         1/1     Running   0          55s

Check the full component stack:

$ kubectl get all --namespace monitoring

NAME                                                         READY   STATUS    RESTARTS        AGE
pod/alertmanager-kind-prometheus-kube-prome-alertmanager-0   2/2     Running   1 (7m21s ago)   7m43s
pod/kind-prometheus-grafana-59764d785-fq26p                  3/3     Running   0               7m59s
pod/kind-prometheus-kube-prome-operator-75468846f9-ng4kk     1/1     Running   0               7m59s
pod/kind-prometheus-kube-state-metrics-554c667875-mg27l      1/1     Running   0               7m59s
pod/kind-prometheus-prometheus-node-exporter-l7qng           1/1     Running   0               2m40s
pod/prometheus-kind-prometheus-kube-prome-prometheus-0       2/2     Running   0               7m43s

NAME                                               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/alertmanager-operated                      ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP   7m43s
service/kind-prometheus-grafana                    NodePort    <none>        80:31000/TCP                 7m59s
service/kind-prometheus-kube-prome-alertmanager    NodePort     <none>        9093:32000/TCP               7m59s
service/kind-prometheus-kube-prome-operator        ClusterIP     <none>        443/TCP                      7m59s
service/kind-prometheus-kube-prome-prometheus      NodePort   <none>        9090:30000/TCP               7m59s
service/kind-prometheus-kube-state-metrics         ClusterIP   <none>        8080/TCP                     7m59s
service/kind-prometheus-prometheus-node-exporter   NodePort       <none>        9100:32001/TCP               7m59s
service/prometheus-operated                        ClusterIP   None             <none>        9090/TCP                     7m43s

NAME                                                      DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/kind-prometheus-prometheus-node-exporter   1         1         1       1            1           <none>          7m59s

NAME                                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/kind-prometheus-grafana               1/1     1            1           7m59s
deployment.apps/kind-prometheus-kube-prome-operator   1/1     1            1           7m59s
deployment.apps/kind-prometheus-kube-state-metrics    1/1     1            1           7m59s

NAME                                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/kind-prometheus-grafana-59764d785                1         1         1       7m59s
replicaset.apps/kind-prometheus-kube-prome-operator-75468846f9   1         1         1       7m59s
replicaset.apps/kind-prometheus-kube-state-metrics-554c667875    1         1         1       7m59s

NAME                                                                    READY   AGE
statefulset.apps/alertmanager-kind-prometheus-kube-prome-alertmanager   1/1     7m43s
statefulset.apps/prometheus-kind-prometheus-kube-prome-prometheus       1/1     7m43s

After the install you'll find:

Log into Grafana with the following credentials:

Username: admin
Password: prom-operator


Teardown the stack when you're done with:

$ kubectl delete namespace monitoring


This repo has one main CI pipeline, that builds and publishes the draft to GCS in a GitHub Action.

Commits to the main branch will trigger a build and deploy and generally within 20 seconds of a push you should see the updated docs at the Draft environment link.


