-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster bricks when all journalnodes are down #338
Comments
Good news: |
The problem was the journalnodes rejecting the namenodes because of reverse DNS roulette
resulting in
The I solved the problem by not using |
Closing this, as we fixed the problem in the Kerberos feature-branch. Please feel free to re-open when the problem reappears! |
# Description Closes #178 Fixes #338 TODOs - [x] Release new Hadoop image with openssl and Kerberos clients use in docs and tests - [x] Release and use operator-rs change - [x] Fix hardcoded `kinit nn/simple-hdfs-namenode-default.default.svc.cluster.local@CLUSTER.LOCAL -kt /stackable/kerberos/keytab` in entrypoints - [x] Go through all hadoop settings and see if they can be improved - [X] Test different realms - [x] Discuss CRD change - [x] Discuss how to expose this in Discovery CM -> During on-site 2023/05 we have decided to ship this feature without exposing it via discovery *for now* - [x] Implement discovery - [x] Tests - [x] Docs - [x] Let @maltesander have a look how we can better include the init container in the code structure - [x] Test long running cluster (maybe turn down ticket lifetime for that)
Affected version
main
Current and expected behavior
The JournalNodes seem to get stuck in a crashloop if all are deleted at the same time, complaining about not being able to find missing edits.
Possible solution
No response
Additional context
No response
Environment
No response
Would you like to work on fixing this bug?
None
The text was updated successfully, but these errors were encountered: