Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent primary hpa collision for keda scaled objects when migrating from an hpa #1677

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jdgeisler
Copy link

This MR fixes #1646 where Flagger fails to create the primary hpa for a keda scaled object when you are migrating from a regular hpa to a scaled object in a canary.

Using the scaledobject.keda.sh/transfer-hpa-ownership: "true" annotation, a keda scaled object can take ownership of an already existing HPA. Currently, annotations are not copied over from the canary scaled object to the primary scaled object, so I am adding this functionality.

Additionally, we also need to add the Advanced.HorizontalPodAutoscalerConfig.Name field to the ScaledObjectSpec so that we can specify the HPA name that the scaled object will take ownership of. When creating the primary scaled object from the canary, we use the same hpa name but append -primary to the end of it.

Testing

With the above changes, we can then successfully use the transfer hpa ownership annotation and specify the HPA that the scaled objects will take ownership of. This prevents KEDA from failing to create the new primary hpa since it already exists. Instead it just uses and modifies the already existing one.

Applying the following diff to test in a kubernetes cluster:

fortio, fortio-server-deployment-2, Canary (flagger.app) has changed:
  # Source: fortio/templates/fortio.yaml
  apiVersion: flagger.app/v1beta1
  kind: Canary
  metadata:
    name: fortio-server-deployment-2
    namespace: fortio
  spec:
    analysis:
      interval: 30s
      maxWeight: 50
      metrics:
      - interval: 30s
        name: request-success-rate
        thresholdRange:
          min: 99
      stepWeight: 10
      threshold: 10
      webhooks:
      - metadata:
          async: "on"
          c: "6"
          labels: fortio-server-deployment-2
          load: Start
          nocatchup: "on"
          qps: "12"
          save: "on"
          stdclient: "on"
          t: 600s
          uniform: "on"
          url: http://fortio-server-deployment-2-canary.fortio.svc.cluster.local:8080/
        name: generateLoad
        timeout: 60s
        type: pre-rollout
        url: http://fortio-client.fortio.svc.cluster.local:8080/fortio/rest/run?jsonPath=.metadata
    progressDeadlineSeconds: 600
    autoscalerRef:
-     apiVersion: autoscaling/v2
-     kind: HorizontalPodAutoscaler
+     apiVersion: keda.sh/v1alpha1
+     kind: ScaledObject
      name: fortio-server-deployment-2
    targetRef:
      apiVersion: apps/v1
      kind: Deployment
      name: fortio-server-deployment-2
    service:
      port: 8080
      targetPort: 8080
fortio, fortio-server-deployment-2, HorizontalPodAutoscaler (autoscaling) has been removed:
- # Source: fortio/templates/fortio.yaml
- apiVersion: autoscaling/v2
- kind: HorizontalPodAutoscaler
- metadata:
-   name: fortio-server-deployment-2
-   namespace: fortio
- spec:
-   maxReplicas: 4
-   metrics:
-   - resource:
-       name: cpu
-       target:
-         averageUtilization: 99
-         type: Utilization
-     type: Resource
-   minReplicas: 2
-   scaleTargetRef:
-     apiVersion: apps/v1
-     kind: Deployment
-     name: fortio-server-deployment-2
+
fortio, fortio-server-deployment-2, ScaledObject (keda.sh) has been added:
-
+ # Source: fortio/templates/fortio.yaml
+ apiVersion: keda.sh/v1alpha1
+ kind: ScaledObject
+ metadata:
+   name: fortio-server-deployment-2
+   namespace: fortio
+   annotations:
+     scaledobject.keda.sh/transfer-hpa-ownership: "true"
+ spec:
+   advanced:
+     horizontalPodAutoscalerConfig:
+       name: fortio-server-deployment-2
+   scaleTargetRef:
+     name: fortio-server-deployment-2
+   minReplicaCount: 2
+   maxReplicaCount: 4
+   triggers:
+   - type: prometheus
+     metadata:
+       serverAddress: http://kube-prometheus-stack-prometheus:9090/prometheus
+       threshold: '100'
+       query: sum(rate(istio_requests_total{destination_service_name="fortio-server-2"}[1m]))
+   - type: cpu
+     metricType: Utilization
+     metadata:
+       value: "99"

See that the primary scaled object successfully takes over ownership of the primary hpa and it no longer fails.

(⎈|arn:aws:eks:us-east-2:824163854805:cluster/rokumesh-auto-jgeisler-t7:fortio)jgeisler@PK4V290RH9 platform % k get so
NAME                                 SCALETARGETKIND      SCALETARGETNAME                      MIN   MAX   TRIGGERS     AUTHENTICATION   READY   ACTIVE    FALLBACK   PAUSED    AGE
fortio-server-deployment-2           apps/v1.Deployment   fortio-server-deployment-2           2     4     prometheus                    True    Unknown   False      True      170m
fortio-server-deployment-2-primary   apps/v1.Deployment   fortio-server-deployment-2-primary   2     4     prometheus                    True    False     False      Unknown   170m
(⎈|arn:aws:eks:us-east-2:824163854805:cluster/rokumesh-auto-jgeisler-t7:fortio)jgeisler@PK4V290RH9 platform % k get hpa
NAME                                 REFERENCE                                       TARGETS               MINPODS   MAXPODS   REPLICAS   AGE
fortio-server-deployment-2-primary   Deployment/fortio-server-deployment-2-primary   0/100 (avg), 4%/99%   2         4         2          13d

@jdgeisler jdgeisler force-pushed the keda-scaled-object-hpa-migration branch 2 times, most recently from b446861 to c8859f4 Compare July 10, 2024 15:26
…led object

Signed-off-by: James Geisler <geislerjamesd@gmail.com>
@jdgeisler jdgeisler force-pushed the keda-scaled-object-hpa-migration branch from c8859f4 to 5965983 Compare August 13, 2024 14:16
@jdgeisler
Copy link
Author

Hey @stefanprodan and @aryan9600, looking for some feedback here on this PR and/or the issue it fixes #1646

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KEDA Scaled Object Primary creation failed due to workload already managed by the hpa
1 participant