GitBook: [master] 5 pages modified

fluxcd · Dec 19, 2018 · 36ce610 · 36ce610
1 parent 1dc2aa1
commit 36ce610
Show file tree

Hide file tree

Showing 5 changed files with 455 additions and 0 deletions.
diff --git a/docs/gitbook/SUMMARY.md b/docs/gitbook/SUMMARY.md
@@ -1,9 +1,16 @@
 # Table of contents
 
 * [Introduction](README.md)
+* [How it works](how-it-works.md)
 
 ## Install
 
 * [Installing Flagger](install-1/installing-flagger.md)
 * [Installing Grafana](install-1/installing-grafana.md)
 
+## Usage
+
+* [Progressive Delivery](usage/progressive-delivery.md)
+* [Monitoring](usage/monitoring.md)
+* [Alerting](usage/alerting.md)
+
diff --git a/docs/gitbook/how-it-works.md b/docs/gitbook/how-it-works.md
@@ -0,0 +1,147 @@
+---
+description: Automated canary deployments process
+---
+
+# How it works
+
+[Flagger](https://github.com/stefanprodan/flagger) takes a Kubernetes deployment and optionally a horizontal pod autoscaler \(HPA\) and creates a series of objects \(Kubernetes deployments, ClusterIP services and Istio virtual services\) to drive the canary analysis and promotion. 
+
+![flagger-canary-hpa](https://github.com/raw/stefanprodan/flagger/master/docs/diagrams/flagger-canary-hpa.png)
+
+### Canary Custom Resource
+
+For a deployment named _**podinfo**_, a canary promotion can be defined using Flagger's custom resource:
+
+```yaml
+apiVersion: flagger.app/v1alpha1
+kind: Canary
+metadata:
+  name: podinfo
+  namespace: test
+spec:
+  # deployment reference
+  targetRef:
+    apiVersion: apps/v1
+    kind: Deployment
+    name: podinfo
+  # the maximum time in seconds for the canary deployment
+  # to make progress before it is rollback (default 600s)
+  progressDeadlineSeconds: 60
+  # hpa reference (optional)
+  autoscalerRef:
+    apiVersion: autoscaling/v2beta1
+    kind: HorizontalPodAutoscaler
+    name: podinfo
+  service:
+    # container port
+    port: 9898
+    # Istio gateways (optional)
+    gateways:
+    - public-gateway.istio-system.svc.cluster.local
+    # Istio virtual service host names (optional)
+    hosts:
+    - app.istio.weavedx.com
+  canaryAnalysis:
+    # max number of failed metric checks before rollback
+    threshold: 5
+    # max traffic percentage routed to canary
+    # percentage (0-100)
+    maxWeight: 50
+    # canary increment step
+    # percentage (0-100)
+    stepWeight: 10
+    metrics:
+    - name: istio_requests_total
+      # minimum req success rate (non 5xx responses)
+      # percentage (0-100)
+      threshold: 99
+      interval: 1m
+    - name: istio_request_duration_seconds_bucket
+      # maximum req duration P99
+      # milliseconds
+      threshold: 500
+      interval: 30s
+
+```
+
+### Canary Deployment
+
+![flagger-canary-steps](https://github.com/raw/stefanprodan/flagger/master/docs/diagrams/flagger-canary-steps.png)
+
+Gated canary promotion stages:
+
+* scan for canary deployments
+* creates the primary deployment if needed
+* check Istio virtual service routes are mapped to primary and canary ClusterIP services
+* check primary and canary deployments status
+  * halt advancement if a rolling update is underway
+  * halt advancement if pods are unhealthy
+* increase canary traffic weight percentage from 0% to 5% \(step weight\)
+* check canary HTTP request success rate and latency
+  * halt advancement if any metric is under the specified threshold
+  * increment the failed checks counter
+* check if the number of failed checks reached the threshold
+  * route all traffic to primary
+  * scale to zero the canary deployment and mark it as failed
+  * wait for the canary deployment to be updated \(revision bump\) and start over
+* increase canary traffic weight by 5% \(step weight\) till it reaches 50% \(max weight\)
+  * halt advancement while canary request success rate is under the threshold
+  * halt advancement while canary request duration P99 is over the threshold
+  * halt advancement if the primary or canary deployment becomes unhealthy
+  * halt advancement while canary deployment is being scaled up/down by HPA
+* promote canary to primary
+  * copy canary deployment spec template over primary
+* wait for primary rolling update to finish
+  * halt advancement if pods are unhealthy
+* route all traffic to primary
+* scale to zero the canary deployment
+* mark rollout as finished
+* wait for the canary deployment to be updated \(revision bump\) and start over
+
+You can change the canary analysis _max weight_ and the _step weight_ percentage in the Flagger's custom resource.
+
+### Canary Analisys
+
+ The canary analysis is using the following promql queries:
+
+_HTTP requests success rate percentage_
+
+```javascript
+sum(
+    rate(
+        istio_requests_total{
+          reporter="destination",
+          destination_workload_namespace=~"$namespace",
+          destination_workload=~"$workload",
+          response_code!~"5.*"
+        }[$interval]
+    )
+) 
+/ 
+sum(
+    rate(
+        istio_requests_total{
+          reporter="destination",
+          destination_workload_namespace=~"$namespace",
+          destination_workload=~"$workload"
+        }[$interval]
+    )
+)
+```
+
+_HTTP requests milliseconds duration P99_
+
+```javascript
+histogram_quantile(0.99, 
+  sum(
+    irate(
+      istio_request_duration_seconds_bucket{
+        reporter="destination",
+        destination_workload=~"$workload",
+        destination_workload_namespace=~"$namespace"
+      }[$interval]
+    )
+  ) by (le)
+)
+```
+
diff --git a/docs/gitbook/usage/alerting.md b/docs/gitbook/usage/alerting.md
@@ -0,0 +1,41 @@
+---
+description: Slack & Alertmanager
+---
+
+# Alerting
+
+### Slack
+
+Flagger can be configured to send Slack notifications:
+
+```bash
+helm upgrade -i flagger flagger/flagger \
+--namespace=istio-system \
+--set slack.url=https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK \
+--set slack.channel=general \
+--set slack.user=flagger
+```
+
+Once configured with a Slack incoming **webhook**, Flagger will post messages when a canary deployment has been initialised, when a new revision has been detected and if the canary analysis failed or succeeded.
+
+![flagger-slack](https://github.com/raw/stefanprodan/flagger/master/docs/screens/slack-canary-notifications.png)
+
+A canary deployment will be rolled back if the progress deadline exceeded or if the analysis reached the maximum number of failed checks:
+
+![flagger-slack-errors](https://github.com/raw/stefanprodan/flagger/master/docs/screens/slack-canary-failed.png)
+
+### Prometheus Alert Manager
+
+Besides Slack, you can use Alertmanager to trigger alerts when a canary deployment failed:
+
+```yaml
+  - alert: canary_rollback
+    expr: flagger_canary_status > 1
+    for: 1m
+    labels:
+      severity: warning
+    annotations:
+      summary: "Canary failed"
+      description: "Workload {{ $labels.name }} namespace {{ $labels.namespace }}"
+```
+
diff --git a/docs/gitbook/usage/monitoring.md b/docs/gitbook/usage/monitoring.md
@@ -0,0 +1,73 @@
+---
+description: Metrics & Logging
+---
+
+# Monitoring
+
+### Grafana
+
+Flagger comes with a Grafana dashboard made for canary analysis. Install Grafana with Helm:
+
+```bash
+helm upgrade -i flagger-grafana flagger/grafana \
+--namespace=istio-system \
+--set url=http://prometheus:9090 \
+--set user=admin \
+--set password=admin
+```
+
+The dashboard shows the RED and USE metrics for the primary and canary workloads:
+
+![canary dashboard](https://github.com/raw/stefanprodan/flagger/master/docs/screens/grafana-canary-analysis.png)
+
+### Logging
+
+The canary errors and latency spikes have been recorded as Kubernetes events and logged by Flagger in json format:
+
+```text
+kubectl -n istio-system logs deployment/flagger --tail=100 | jq .msg
+
+Starting canary deployment for podinfo.test
+Advance podinfo.test canary weight 5
+Advance podinfo.test canary weight 10
+Advance podinfo.test canary weight 15
+Advance podinfo.test canary weight 20
+Advance podinfo.test canary weight 25
+Advance podinfo.test canary weight 30
+Advance podinfo.test canary weight 35
+Halt podinfo.test advancement success rate 98.69% < 99%
+Advance podinfo.test canary weight 40
+Halt podinfo.test advancement request duration 1.515s > 500ms
+Advance podinfo.test canary weight 45
+Advance podinfo.test canary weight 50
+Copying podinfo.test template spec to podinfo-primary.test
+Halt podinfo-primary.test advancement waiting for rollout to finish: 1 old replicas are pending termination
+Scaling down podinfo.test
+Promotion completed! podinfo.test
+```
+
+### Metrics
+
+Flagger exposes Prometheus metrics that can be used to determine the canary analysis status and the destination weight values:
+
+```bash
+# Canaries total gauge
+flagger_canary_total{namespace="test"} 1
+
+# Canary promotion last known status gauge
+# 0 - running, 1 - successful, 2 - failed
+flagger_canary_status{name="podinfo" namespace="test"} 1
+
+# Canary traffic weight gauge
+flagger_canary_weight{workload="podinfo-primary" namespace="test"} 95
+flagger_canary_weight{workload="podinfo" namespace="test"} 5
+
+# Seconds spent performing canary analysis histogram
+flagger_canary_duration_seconds_bucket{name="podinfo",namespace="test",le="10"} 6
+flagger_canary_duration_seconds_bucket{name="podinfo",namespace="test",le="+Inf"} 6
+flagger_canary_duration_seconds_sum{name="podinfo",namespace="test"} 17.3561329
+flagger_canary_duration_seconds_count{name="podinfo",namespace="test"} 6
+```
+
+#### 
+