Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

444 add a gh action for deploying and verifying dashboards alerts load ok with quickstart #708

Merged
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
101b306
added script
ehearneRedHat Jun 14, 2024
7256b63
added faulty code to github action
ehearneRedHat Jun 14, 2024
71ec541
switched to podman (limited success)
ehearneRedHat Jun 17, 2024
6c8678e
changed to self-hosted and removed install for packages
ehearneRedHat Jun 18, 2024
6f4d502
test push
ehearneRedHat Jun 18, 2024
a41016e
changed trigger to pull request + fix typos
ehearneRedHat Jun 18, 2024
fa55047
fixed typos + added wait for prometheus
ehearneRedHat Jun 19, 2024
83bb3f3
added alias for sudo podman
ehearneRedHat Jun 20, 2024
c3b9691
added files to .gitignore
ehearneRedHat Jun 20, 2024
9796180
added terraform file for self-hosted-runner
ehearneRedHat Jun 20, 2024
703c4dd
added self hosted runner functionality to workflow
ehearneRedHat Jun 21, 2024
0cee245
deregister-runner no longer needs a job
ehearneRedHat Jun 21, 2024
fb7b5ac
ran command in background
ehearneRedHat Jun 21, 2024
37b517b
have deregister run if previous job fails
ehearneRedHat Jun 24, 2024
6a08c43
terraform.tfstate
ehearneRedHat Jun 25, 2024
ebc3d8c
changed over to github app for token gen
ehearneRedHat Jun 25, 2024
63c2a78
save changes (undone)
ehearneRedHat Jun 25, 2024
67f8bd4
swapped out github action for manual api interactions instead
ehearneRedHat Jun 26, 2024
003d5f7
added code to remove runner through ec2 instance
ehearneRedHat Jun 26, 2024
aa3d658
add terraform script and workflow for create ami
ehearneRedHat Jun 26, 2024
d5c7dba
null resource block comment
ehearneRedHat Jun 27, 2024
cd52dff
check for ssh-ability
ehearneRedHat Jun 27, 2024
b65101c
cleanup + changed runner to github for verify workflow .
ehearneRedHat Jul 8, 2024
b4fbe1e
removed chown
ehearneRedHat Jul 8, 2024
653bfb3
separated set up and tear down from tests
ehearneRedHat Jul 8, 2024
be80445
additional clean up
ehearneRedHat Jul 8, 2024
fc8472f
removed quickstart dependency
ehearneRedHat Jul 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 122 additions & 0 deletions .github/workflows/verify-dashboards-alerts.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
name: Verify Dashboards and Alerts OK

on:
push:
branches: main
paths:
# Dashboards
- examples/dashboards/app_developer.json
- examples/dashboards/business_user.json
- examples/dashboards/platform_engineer.json
# Alerts
- examples/alerts/prometheusrules_policies_missing.yaml
- examples/alerts/slo-availability.yaml
- examples/alerts/slo-latency.yaml
jobs:
verify-dashboards-alerts:
name: Verify Dashboards and Alerts OK
runs-on: ubuntu-latest
defaults:
run:
shell: bash -eo pipefail {0}
steps:
- uses: actions/checkout@v2

- name: Set up golang
run: |
sudo apt-get update -y
ehearneRedHat marked this conversation as resolved.
Show resolved Hide resolved
sudo apt-get install -y golang

- name: Deploy observability stack (Grafana and Prometheus)
run: |
kind create cluster
# Install Istio
kubectl apply -k config/dependencies/istio/sail
kubectl -n istio-system wait --for=condition=Available deployment istio-operator --timeout=300s
kubectl apply -f config/dependencies/istio/sail/istio.yaml

# Install Observability Stack (Grafana and Prometheus)
kubectl kustomize config/observability/ | docker run --rm -i ryane/kfilt -i kind=CustomResourceDefinition | kubectl apply --server-side -f -
kubectl kustomize config/observability/ | docker run --rm -i ryane/kfilt -x kind=CustomResourceDefinition | kubectl apply -f -
kubectl kustomize examples/dashboards/ | kubectl apply --server-side -f -
kubectl kustomize examples/alerts/ | kubectl apply --server-side -f -

- name: Port forward grafana
run: |
# Port forward Grafana
kubectl -n monitoring wait --for=condition=available deployment grafana --timeout=600s
ehearneRedHat marked this conversation as resolved.
Show resolved Hide resolved
kubectl -n monitoring port-forward service/grafana 3000:3000 &
echo "Successfully port forwarded Grafana service."

- name: Port forward Prometheus.
run: |
kubectl -n monitoring wait --for=condition=ready pod prometheus-k8s-0 --timeout=600s
# Port forward Prometheus
kubectl -n monitoring port-forward service/prometheus-k8s 9090:9090 &
echo "Successfully port forwarded Prometheus service."

- name: Check if Grafana contains dashboards.
run: |
# Make API Call and save response to variable.
grafana_api_call=$(curl -u admin:admin http://127.0.0.1:3000/api/search)

# Compare the content in json file with field containing dashboard names

app_developer=$(jq -r '.panels[1].title' examples/dashboards/app_developer.json)
business_user=$(jq -r '.panels[1].title' examples/dashboards/business_user.json)
platform_engineer=$(jq -r '.panels[1].title' examples/dashboards/platform_engineer.json)

declare -a missing_dashboards=()

if [[ "$grafana_api_call" != *"$app_developer"* ]]; then
echo "Grafana does not have $app_developer dashboard."
missing_dashboards+=("$app_developer")
fi
if [[ "$grafana_api_call" != *"$business_user"* ]]; then
echo "Grafana does not have $business_user dashboard."
missing_dashboards+=("$business_user")
fi
if [[ "$grafana_api_call" != *"$platform_engineer"* ]]; then
echo "Grafana does not have $platform_engineer dashboard."
missing_dashboards+=("$platform_engineer")
fi

if [[ ${#missing_dashboards[@]} -gt 0 ]]; then
echo "Grafana is missing the following dashboards:"
printf '%s\n' "${missing_dashboards[@]}"
echo "Exiting..."
exit 1
fi

echo "Grafana contains dashboards $app_developer, $business_user and $platform_engineer. Continuing to Prometheus..."

- name: Check if Prometheus contains alert rules.
run: |
# Make API Call and save response to variable
prometheus_api_call=$(curl http://localhost:9090/api/v1/rules)

# Compare the content in json file with field containing dashboard names.

readarray -t prometheusrules_policies_missing_alerts < <(yq e '.spec.groups[].rules[].alert' examples/alerts/prometheusrules_policies_missing.yaml)
readarray -t slo_availability_alerts < <(yq e '.spec.groups[].rules[].alert' examples/alerts/slo-availability.yaml)
readarray -t slo_latency_alerts < <(yq e '.spec.groups[].rules[].alert' examples/alerts/slo-latency.yaml)

combined_alerts=("${prometheusrules_policies_missing_alerts[@]}" "${slo_availability_alerts[@]}" "${slo_latency_alerts[@]}")

declare -a missing_alerts=()

for alert in "${combined_alerts[@]}"; do
if [[ "$prometheus_api_call" != *"$alert"* && "$alert" != "null" ]]; then
echo "Prometheus does not have $alert rule."
missing_alerts+=("$alert")
fi
done

if [[ ${#missing_alerts[@]} -gt 0 ]]; then
echo "Prometheus is missing the following alerts:"
printf '%s\n' "${missing_alerts[@]}"
echo "Exiting..."
exit 1
fi

echo "Prometheus has all alert rules."
Loading