Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix health-check to check actual health #521

Open
sbernauer opened this issue Nov 2, 2023 · 0 comments
Open

Fix health-check to check actual health #521

sbernauer opened this issue Nov 2, 2023 · 0 comments
Labels

Comments

@sbernauer
Copy link
Member

Affected version

0.0.0-dev

Current and expected behavior

We have lost data in a demo, as Nifi was complaining about to reaching ZooKeeper and the health-checks did not notice it.
Simply restarting the pod solved the problem, which would have done if the livenessProbe would have detected the problem.

Currently the livenessProbe looks like

    livenessProbe:
      failureThreshold: 3
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      tcpSocket:
        port: https
      timeoutSeconds: 1

While the numbers itself are arguable - (e.g why have a initialDelaySeconds when we have a startup probe?) and a readinessProbe is missing - the most important thing is, that a simple check on the port is not enough.

Possible solution

We should instead use https://nifi.apache.org/docs/nifi-docs/rest-api/ to check the actual node health. The most complicated part will be auth I fear (e.g. add a static user with an operator-created random secret and put it in the Authentication chain),

Additional context

No response

Environment

No response

Would you like to work on fixing this bug?

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant