Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error applying changes in ttlSecondsAfterFinished with 1.13 #494

Closed
kimxogus opened this issue Jul 3, 2019 · 15 comments · Fixed by #503
Closed

Error applying changes in ttlSecondsAfterFinished with 1.13 #494

kimxogus opened this issue Jul 3, 2019 · 15 comments · Fixed by #503
Labels
bug Something isn't working

Comments

@kimxogus
Copy link

kimxogus commented Jul 3, 2019

After upgrading jaeger operator 1.13, error below occurred

time="2019-07-03T04:50:40Z" level=error msg="failed to apply the changes" error="CronJob.batch \"jaeger-spark-dependencies\" is invalid: spec.jobTemplate.spec.ttlSecondsAfterFinished: Forbidden: disabled by feature-gate" execution="2019-07-03 04:50:40.206287771 +0000 UTC" instance=jaeger namespace=jaeger

It seems ttlSecondsAfterFinished should not be set by default, otherwise TTLAfterFinished feature gate must be enabled in cluster.

@jpkrohling jpkrohling added the bug Something isn't working label Jul 3, 2019
@jpkrohling
Copy link
Contributor

Do you think this deserves a patch release?

@jpkrohling
Copy link
Contributor

@kimxogus which Kubernetes version are you using?

@headcr4sh
Copy link

headcr4sh commented Jul 3, 2019

@jpkrohling
I am using the managed Kubernetes offering of AWS (EKS) and run into the same errors as @kimxogus.

Neither EKS v1.12, nor EKS v1.13 have the needed feature gate active and there is no way for end-users to flip the switch, as it is a managed service offering.

Besides, ttlSecondsAfterFinished seems to be an alpha feature and I would not assume that many Kubernetes clusters out in the wild have set it to be disabled...

See also: https://kubernetes.io/docs/concepts/workloads/controllers/ttlafterfinished/

@jpkrohling
Copy link
Contributor

Thanks for confirming, @headcr4sh. Would any of you be able to provide a patch? We are currently working on another issue at the moment, and we might need to do a patch release once that issue is fixed. If we have a PR for this one here, we'll include it in the patch release.

@vishnuhd
Copy link

vishnuhd commented Jul 3, 2019

I am also facing the similar issue :

Operator version - 1.13.0
K8s version - v1.12.8 (Using KOPS on AWS)

Operator Logs :

$ kubectl logs jaeger-operator-875794ddb-t2g75 -n observability
time="2019-07-03T08:37:59Z" level=info msg=Versions arch=amd64 jaeger-operator=1.13.0 operator-sdk=v0.8.1 os=linux version=go1.12.5
time="2019-07-03T08:38:00Z" level=info msg="Auto-detected the platform" platform=kubernetes
time="2019-07-03T08:38:00Z" level=info msg="Automatically adjusted the 'es-provision' flag" es-provision=false
time="2019-07-03T09:08:06Z" level=error msg="failed to apply the changes" error="CronJob.batch \"my-jaeger-spark-dependencies\" is invalid: spec.jobTemplate.spec.ttlSecondsAfterFinished: Forbidden: disabled by feature-gate" execution="2019-07-03 09:08:06.300138326 +0000 UTC" instance=my-jaeger namespace=default
time="2019-07-03T09:08:07Z" level=error msg="failed to apply the changes" error="CronJob.batch \"my-jaeger-spark-dependencies\" is invalid: spec.jobTemplate.spec.ttlSecondsAfterFinished: Forbidden: disabled by feature-gate" execution="2019-07-03 09:08:07.722945608 +0000 UTC" instance=my-jaeger namespace=default
time="2019-07-03T09:08:08Z" level=error msg="failed to apply the changes" error="CronJob.batch \"my-jaeger-spark-dependencies\" is invalid: spec.jobTemplate.spec.ttlSecondsAfterFinished: Forbidden: disabled by feature-gate" execution="2019-07-03 09:08:08.732845667 +0000 UTC" instance=my-jaeger namespace=default
time="2019-07-03T09:08:09Z" level=error msg="failed to apply the changes" error="CronJob.batch \"my-jaeger-spark-dependencies\" is invalid: spec.jobTemplate.spec.ttlSecondsAfterFinished: Forbidden: disabled by feature-gate" execution="2019-07-03 09:08:09.746644265 +0000 UTC" instance=my-jaeger namespace=default
time="2019-07-03T09:08:10Z" level=error msg="failed to apply the changes" error="CronJob.batch \"my-jaeger-spark-dependencies\" is invalid: spec.jobTemplate.spec.ttlSecondsAfterFinished: Forbidden: disabled by feature-gate" execution="2019-07-03 09:08:10.769429996 +0000 UTC" instance=my-jaeger namespace=default
time="2019-07-03T09:08:11Z" level=error msg="failed to apply the changes" error="CronJob.batch \"my-jaeger-spark-dependencies\" is invalid: spec.jobTemplate.spec.ttlSecondsAfterFinished: Forbidden: disabled by feature-gate" execution="2019-07-03 09:08:11.779283957 +0000 UTC" instance=my-jaeger namespace=default
time="2019-07-03T09:08:12Z" level=error msg="failed to apply the changes" error="CronJob.batch \"my-jaeger-spark-dependencies\" is invalid: spec.jobTemplate.spec.ttlSecondsAfterFinished: Forbidden: disabled by feature-gate" execution="2019-07-03 09:08:12.793126504 +0000 UTC" instance=my-jaeger namespace=default
time="2019-07-03T09:08:13Z" level=error msg="failed to apply the changes" error="CronJob.batch \"my-jaeger-es-index-cleaner\" is invalid: spec.jobTemplate.spec.ttlSecondsAfterFinished: Forbidden: disabled by feature-gate" execution="2019-07-03 09:08:13.802700304 +0000 UTC" instance=my-jaeger namespace=default

Spec :

 apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: my-jaeger
spec:
  strategy: allInOne
  allInOne:
    image: jaegertracing/all-in-one:latest
    options:
      log-level: debug
  ui:
    options:
      dependencies:
        menuEnabled: false
      tracking:
        gaID: UA-000000-2
      menu:
        - label: "About Jaeger"
          items:
            - label: "Documentation"
              url: "https://www.jaegertracing.io/docs/latest"
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: http://quickstart-es:9200
  ingress:
    enabled: false
  agent:
    strategy: DaemonSet
  annotations:
    scheduler.alpha.kubernetes.io/critical-pod: ""

This is working fine with storage.type as memory though.

@headcr4sh
Copy link

@jpkrohling
I am unfortunately not very involved in the Jaeger / jaeger-operator development at present and won't be able to work on a patch for the next few days.

As far as I can tell, the following patch brought in the changes that cause trouble:
#407

A viable solution might be to set ttlSecondsAfterFinished only if the feature gate is active, right?

@headcr4sh
Copy link

headcr4sh commented Jul 3, 2019

Possible solution (untested, because I am not [yet] able to compile/run the jaeger-operator myself):

Replace all occurences of

TTLSecondsAfterFinished *int32 `json:"ttlSecondsAfterFinished"`

with

TTLSecondsAfterFinished *int32 `json:"ttlSecondsAfterFinished,omitempty"`

@jpkrohling
Copy link
Contributor

jpkrohling commented Jul 3, 2019

A similar change would be made anyway as part of #466, so, I'd be OK with a change like that.

https://github.com/jpkrohling/jaeger-operator/blob/11affad2d62b442575b50e15e760beb0a09c4a9c/pkg/apis/jaegertracing/v1/jaeger_types.go#L329-L330

I just pushed a Docker image with this fix and I'd appreciate if one of you could confirm that it works:

Image: jpkroehling/jaeger-operator:494-TTLSecondsAfterFinished. To use it, either do a kubectl edit deployment jaeger-operator, or change the original operator.yaml and run kubectl apply -f operator.yaml.

In the meantime, I'm writing the required tests and will send a PR soon.

@jpkrohling
Copy link
Contributor

jpkrohling commented Jul 3, 2019

For the record, this works with minikube 1.1.0, which uses Kubernetes v1.14.2, which is probably why the e2e tests for ES and Cassandra are passing.

edit: "this works" means that no such failure is seen, but I'm not sure the feature itself is working (ie, that batch job pods are cleaned up after N seconds)

@headcr4sh
Copy link

I just pushed a Docker image with this fix and I'd appreciate if one of you could confirm that it works:

Image: jpkroehling/jaeger-operator:494-TTLSecondsAfterFinished. To use it, either do a kubectl edit deployment jaeger-operator, or change the original operator.yaml and run kubectl apply -f operator.yaml.

Unfortunately, I can confirm that the Docker image you provided does still lead to the exact same errors:

time="2019-07-03T13:07:42Z" level=error msg="failed to apply the changes" error="CronJob.batch \"jaeger-default-es-index-cleaner\" is invalid: spec.jobTemplate.spec.ttlSecondsAfterFinished: Forbidden: disabled by feature-gate" execution="2019-07-03 13:07:42.767636374 +0000 UTC" instance=jaeger-default namespace=tracing

Too bad,... :-(

@jpkrohling
Copy link
Contributor

Are you using the same Jaeger instance that you had before? omitempty won't remove the field if it has been specified already, so, you'd have to test on a new Jaeger instance.

@headcr4sh
Copy link

Are you using the same Jaeger instance that you had before? omitempty won't remove the field if it has been specified already, so, you'd have to test on a new Jaeger instance.

I tried both a new and an existing Jaeger CR manifest. Both lead to the same error, I am afraid.

@jpkrohling
Copy link
Contributor

Alright, I'll try to get hold of an AWS account to reproduce the problem and come up with a patch.

@headcr4sh
Copy link

Alright, I'll try to get hold of an AWS account to reproduce the problem and come up with a patch.

I am pretty sure, that minikube with the correct feature gates could also be used to reproduce the situation:

Just adding --feature-gates=TTLAfterFinished=false should work, if I am not completely wrong.

@jpkrohling
Copy link
Contributor

Looks like the omitempty didn't work because we set an explicit default to the value. For now, it's probably wiser to remove the feature, but keep the property in the CRD, as the CRD with the field is already part of 1.13.

I just pushed a new image making the property noop. Once one of you confirm that this fixes the problem, I'll send a PR and get it reviewed/merged.

Image: jpkroehling/jaeger-operator:494-TTLSecondsAfterFinished-2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants