Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ScaledJob ignores failing trigger(s) error #5922

Closed
josefkarasek opened this issue Jun 28, 2024 · 0 comments · Fixed by #5916
Closed

ScaledJob ignores failing trigger(s) error #5922

josefkarasek opened this issue Jun 28, 2024 · 0 comments · Fixed by #5916
Labels
bug Something isn't working

Comments

@josefkarasek
Copy link
Member

Report

ScaledJob with failing trigger(s) swallows the error (log level 1).

But the status reports that the ScaledJob is "defined correctly and is ready to scaling", while it is not.

status:
  conditions:
  - message: ScaledJob is defined correctly and is ready to scaling
    reason: ScaledJobReady
    status: "True"
    type: Ready
  - message: Scaling is not performed because triggers are not active
    reason: ScalerNotActive
    status: "False"
    type: Active

Logically the Active condition is false - we failed to read the scaling metric, but there's no indication to the user about the failing trigger.

At least an event is emitted:

75s         Warning   KEDAScalerFailed     scaledjob/sample-sj      error requesting metrics endpoint: Get "http://failing-source.com/metrics": dial tcp [::1]:8027: connect: connection refused

Expected Behavior

When a trigger for a ScaledJob fails, the error should not be suppressed, but should be propagated to the user in status and in operator log.

Actual Behavior

When a trigger for a ScaledJob fails, the status says that "ScaledJob is defined correctly and is ready to scaling".
If all triggers for the scaledjob are failing or more commonly this trigger is the only trigger, the scaledjob is never active as it can never read its scaling metric and says "Scaling is not performed because triggers are not active".

It doesn't say that it's not performing any scaling because of the trigger failure.

Steps to Reproduce the Problem

  1. Create a scaledjob with failing trigger
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: sample-sj
  namespace: default
spec:
  failedJobsHistoryLimit: 5
  jobTargetRef:
    template:
      spec:
        containers:
        - args:
          - /bin/sh
          - -c
          - sleep 30
          image: busybox
          name: busybox-worker
        restartPolicy: Never
  maxReplicaCount: 10
  pollingInterval: 30
  successfulJobsHistoryLimit: 3
  triggers:
  - metadata:
      targetValue: "1"
      url: http://failing-source.com/metrics
      valueLocation: value
    type: metrics-api
  1. Check events kubectl get events
  2. Check scaledjob status

Logs from KEDA operator

No logs because the error is swallowed and printed only with elevated log level.

KEDA Version

2.14.0

Kubernetes Version

1.29

Platform

None

Scaler Details

No response

Anything else?

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant