Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Change pod readiness check mechanism #249

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

lukapetrovic-git
Copy link
Contributor

@lukapetrovic-git lukapetrovic-git commented Sep 16, 2024

Description

If needed i can open an issue for this as well. The following happens:

The check if all pods are ready fails in some of my clusters due to grep catching pods that it is not supposed to, for example:

image

In the situation above i have pods that as part of their name have init, and the check never passes, so i changed it to check the metadata of the pod itself and figure out its phase https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Small minor change not affecting the Ansible Role code (GitHub Actions Workflow, Documentation etc.)

How Has This Been Tested?

Tested on Ubuntu 22.04, RKE2 v1.27.12+rke2r1 on a dev cluster and one production cluster where the problems were happening.

@lukapetrovic-git lukapetrovic-git marked this pull request as ready for review September 16, 2024 14:14
@lukapetrovic-git lukapetrovic-git changed the title Change pod readiness check mechanism fix: Change pod readiness check mechanism Sep 16, 2024
args:
executable: /bin/bash
failed_when: "all_pods_ready.rc not in [ 0, 1 ]"
failed_when: "all_pods_ready.rc != 0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why return code 1 was considered ok here, so i made this change, if there is something im not seeing, please comment @MonolithProjects

@lukapetrovic-git
Copy link
Contributor Author

lukapetrovic-git commented Sep 27, 2024

Another question i have regarding this task, why are pods running in kube-system exempt from the check (metadata.namespace!=kube-system)?
One example: When RKE2 service is restarted in my case Cilium pods also get restarted, they run in the kube-system ns and are crucial to the functioning of the cluster as a whole. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant