Skip to content

Commit

Permalink
Merge pull request #88 from stefanprodan/ab-testing
Browse files Browse the repository at this point in the history
A/B testing - canary with session affinity
  • Loading branch information
stefanprodan authored Mar 11, 2019
2 parents bd11563 + 12ac96d commit 1cd0c49
Show file tree
Hide file tree
Showing 21 changed files with 924 additions and 43 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ Flagger documentation can be found at [docs.flagger.app](https://docs.flagger.ap
* [Load testing](https://docs.flagger.app/how-it-works#load-testing)
* Usage
* [Canary promotions and rollbacks](https://docs.flagger.app/usage/progressive-delivery)
* [A/B testing](https://docs.flagger.app/usage/ab-testing)
* [Monitoring](https://docs.flagger.app/usage/monitoring)
* [Alerting](https://docs.flagger.app/usage/alerting)
* Tutorials
Expand Down Expand Up @@ -167,7 +168,6 @@ For more details on how the canary analysis and promotion works please [read the
### Roadmap
* Add A/B testing capabilities using fixed routing based on HTTP headers and cookies match conditions
* Integrate with other service mesh technologies like AWS AppMesh and Linkerd v2
* Add support for comparing the canary metrics to the primary ones and do the validation based on the derivation between the two
Expand Down
61 changes: 61 additions & 0 deletions artifacts/ab-testing/canary.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
apiVersion: flagger.app/v1alpha3
kind: Canary
metadata:
name: abtest
namespace: test
spec:
# deployment reference
targetRef:
apiVersion: apps/v1
kind: Deployment
name: abtest
# the maximum time in seconds for the canary deployment
# to make progress before it is rollback (default 600s)
progressDeadlineSeconds: 60
# HPA reference (optional)
autoscalerRef:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
name: abtest
service:
# container port
port: 9898
# Istio gateways (optional)
gateways:
- public-gateway.istio-system.svc.cluster.local
# Istio virtual service host names (optional)
hosts:
- abtest.istio.weavedx.com
canaryAnalysis:
# schedule interval (default 60s)
interval: 10s
# max number of failed metric checks before rollback
threshold: 10
# total number of iterations
iterations: 10
# canary match condition
match:
- headers:
user-agent:
regex: "^(?!.*Chrome)(?=.*\bSafari\b).*$"
- headers:
cookie:
regex: "^(.*?;)?(user=test)(;.*)?$"
metrics:
- name: istio_requests_total
# minimum req success rate (non 5xx responses)
# percentage (0-100)
threshold: 99
interval: 1m
- name: istio_request_duration_seconds_bucket
# maximum req duration P99
# milliseconds
threshold: 500
interval: 30s
# external checks (optional)
webhooks:
- name: load-test
url: http://flagger-loadtester.test/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://podinfo.test:9898/"
67 changes: 67 additions & 0 deletions artifacts/ab-testing/deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: abtest
namespace: test
labels:
app: abtest
spec:
minReadySeconds: 5
revisionHistoryLimit: 5
progressDeadlineSeconds: 60
strategy:
rollingUpdate:
maxUnavailable: 0
type: RollingUpdate
selector:
matchLabels:
app: abtest
template:
metadata:
annotations:
prometheus.io/scrape: "true"
labels:
app: abtest
spec:
containers:
- name: podinfod
image: quay.io/stefanprodan/podinfo:1.4.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 9898
name: http
protocol: TCP
command:
- ./podinfo
- --port=9898
- --level=info
- --random-delay=false
- --random-error=false
env:
- name: PODINFO_UI_COLOR
value: blue
livenessProbe:
exec:
command:
- podcli
- check
- http
- localhost:9898/healthz
initialDelaySeconds: 5
timeoutSeconds: 5
readinessProbe:
exec:
command:
- podcli
- check
- http
- localhost:9898/readyz
initialDelaySeconds: 5
timeoutSeconds: 5
resources:
limits:
cpu: 2000m
memory: 512Mi
requests:
cpu: 100m
memory: 64Mi
19 changes: 19 additions & 0 deletions artifacts/ab-testing/hpa.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: abtest
namespace: test
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: abtest
minReplicas: 2
maxReplicas: 4
metrics:
- type: Resource
resource:
name: cpu
# scale up if usage is above
# 99% of the requested CPU (100m)
targetAverageUtilization: 99
2 changes: 2 additions & 0 deletions artifacts/flagger/crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,8 @@ spec:
interval:
type: string
pattern: "^[0-9]+(m|s)"
iterations:
type: number
threshold:
type: number
maxWeight:
Expand Down
2 changes: 2 additions & 0 deletions charts/flagger/templates/crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,8 @@ spec:
interval:
type: string
pattern: "^[0-9]+(m|s)"
iterations:
type: number
threshold:
type: number
maxWeight:
Expand Down
9 changes: 3 additions & 6 deletions cmd/flagger/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -87,12 +87,6 @@ func main() {
logger.Fatalf("Error building example clientset: %s", err.Error())
}

if namespace == "" {
logger.Infof("Flagger Canary's Watcher is on all namespace")
} else {
logger.Infof("Flagger Canary's Watcher is on namespace %s", namespace)
}

flaggerInformerFactory := informers.NewSharedInformerFactoryWithOptions(flaggerClient, time.Second*30, informers.WithNamespace(namespace))

canaryInformer := flaggerInformerFactory.Flagger().V1alpha3().Canaries()
Expand All @@ -105,6 +99,9 @@ func main() {
}

logger.Infof("Connected to Kubernetes API %s", ver)
if namespace != "" {
logger.Infof("Watching namespace %s", namespace)
}

ok, err := controller.CheckMetricsServer(metricsServer)
if ok {
Expand Down
Binary file added docs/diagrams/flagger-abtest-steps.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/gitbook/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
## Usage

* [Canary Deployments](usage/progressive-delivery.md)
* [A/B Testing](usage/ab-testing.md)
* [Monitoring](usage/monitoring.md)
* [Alerting](usage/alerting.md)

Expand Down
43 changes: 43 additions & 0 deletions docs/gitbook/how-it-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,49 @@ At any time you can set the `spec.skipAnalysis: true`.
When skip analysis is enabled, Flagger checks if the canary deployment is healthy and
promotes it without analysing it. If an analysis is underway, Flagger cancels it and runs the promotion.

### A/B Testing

Besides weighted routing, Flagger can be configured to route traffic to the canary based on HTTP match conditions.
In an A/B testing scenario, you'll be using HTTP headers or cookies to target a certain segment of your users.
This is particularly useful for frontend applications that require session affinity.

You can enable A/B testing by specifying the HTTP match conditions and the number of iterations:

```yaml
canaryAnalysis:
# schedule interval (default 60s)
interval: 1m
# total number of iterations
iterations: 10
# max number of failed iterations before rollback
threshold: 2
# canary match condition
match:
- headers:
user-agent:
regex: "^(?!.*Chrome)(?=.*\bSafari\b).*$"
- headers:
cookie:
regex: "^(.*?;)?(user=test)(;.*)?$"
```

If Flagger finds a HTTP match condition, it will ignore the `maxWeight` and `stepWeight` settings.

The above configuration will run an analysis for ten minutes targeting the Safari users and those that have a test cookie.
You can determine the minimum time that it takes to validate and promote a canary deployment using this formula:

```
interval * iterations
```

And the time it takes for a canary to be rollback when the metrics or webhook checks are failing:

```
interval * threshold
```

Make sure that the analysis threshold is lower than the number of iterations.

### HTTP Metrics

The canary analysis is using the following Prometheus queries:
Expand Down
Loading

0 comments on commit 1cd0c49

Please sign in to comment.