Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable to sort kube_control_plane including the first node #9866

Conversation

HoKim98
Copy link
Contributor

@HoKim98 HoKim98 commented Mar 8, 2023

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind cleanup

What this PR does / why we need it:

The first kubernetes control plane node has a very important role in overall kubespray workflows. However, if the first node is out of order with the others, the kubernetes cluster configuration will fail catastrophically. For example, certificate loss due to accidental ETCD and Kubernetes SSL key renewal.

Keeping the order of the first nodes is very important, but there is always the potential for human error to cause problems. So, I want to force the dynamic discovery logic to consider one of the kubernetes control plane nodes that are still functioning as the first node, so that we don't run into issues with node order in the first place.

Fortunately, I have been able to find that these attempts are already being made locally. Therefore, this PR will be a generalization of them. (#7989)

> BEFORE

  • groups['kube_control_plane'][0]
  • groups['kube_control_plane'] | first

> AFTER

  • first_kube_control_plane
    • If there is one or more running nodes: Choose the first one of them
    • If there is no running nodes (fresh install): Choose the first one of kube_control_plane

And if control planes are changed via cluster.yml and etc, it will automatically update the cluster-info to the first control plane node.

> BUG FIXES

Which issue(s) this PR fixes:

Fixes #3471 (only for kube_control_plane)
Fixes #9863

Special notes for your reviewer:

This PR has many changes for usage of kube_control_plane | first.

So many various tests are needed, but I can't be sure that this changes are complete yet, so I want to review with others.

Does this PR introduce a user-facing change?:

Changing the order of kube_control_plane is now allowed freely.
cluster-info configmap is automatically updated to the first kube_control_plane.

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 8, 2023
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 8, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @kerryeon. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Mar 8, 2023
@HoKim98 HoKim98 force-pushed the raplace-first-kube-control-plane branch from c92bb0d to e822eec Compare March 8, 2023 08:57
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 8, 2023
@@ -158,7 +158,7 @@
filename: "{{ kube_config_dir }}/kdd-crds.yml"
state: "latest"
when:
- inventory_hostname == groups['kube_control_plane'][0]
- inventory_hostname == first_kube_control_plane
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not appropriate to apply these yaml files on every node here, because there may be conflicts and errors when applying them concurrently, and it also wastes time.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 11, 2023
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 9, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 9, 2023
@@ -0,0 +1,83 @@
---
- name: Load ca.crt
shell: set -o pipefail && cat "{{ kube_apiserver_client_cert }}" | base64 --wrap=0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could use slurp rather than shell. If you have unwanted wrapping just remove it with a filter.

changed_when: false

- name: Get current kubernetes cluster-info
command: "{{ kubectl }} -n kube-public get configmap cluster-info -o jsonpath --template '{.data.kubeconfig}'"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, why not use k8s.core.k8s_info ?

- kube-cluster-info

- name: Update kubernetes cluster-info
command:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Comment on lines -551 to +567
kube_apiserver_global_endpoint: |-
{% if loadbalancer_apiserver is defined -%}
https://{{ apiserver_loadbalancer_domain_name }}:{{ loadbalancer_apiserver.port|default(kube_apiserver_port) }}
{%- elif loadbalancer_apiserver_localhost and (loadbalancer_apiserver_port is not defined or loadbalancer_apiserver_port == kube_apiserver_port) -%}
https://localhost:{{ kube_apiserver_port }}
{%- else -%}
https://{{ first_kube_control_plane_address }}:{{ kube_apiserver_port }}
{%- endif %}
kube_apiserver_endpoint: |-
{% if loadbalancer_apiserver is defined -%}
https://{{ apiserver_loadbalancer_domain_name }}:{{ loadbalancer_apiserver.port|default(kube_apiserver_port) }}
{%- elif not is_kube_master and loadbalancer_apiserver_localhost -%}
https://localhost:{{ loadbalancer_apiserver_port|default(kube_apiserver_port) }}
{%- elif is_kube_master -%}
https://{{ kube_apiserver_bind_address | regex_replace('0\.0\.0\.0','127.0.0.1') }}:{{ kube_apiserver_port }}
{%- else -%}
https://{{ first_kube_control_plane_address }}:{{ kube_apiserver_port }}
{%- endif %}
# kube_apiserver_global_endpoint: |-
# {% if loadbalancer_apiserver is defined -%}
# https://{{ apiserver_loadbalancer_domain_name }}:{{ loadbalancer_apiserver.port|default(kube_apiserver_port) }}
# {%- elif loadbalancer_apiserver_localhost and (loadbalancer_apiserver_port is not defined or loadbalancer_apiserver_port == kube_apiserver_port) -%}
# https://localhost:{{ kube_apiserver_port }}
# {%- else -%}
# https://{{ first_kube_control_plane_address }}:{{ kube_apiserver_port }}
# {%- endif %}
# kube_apiserver_endpoint: |-
# {% if loadbalancer_apiserver is defined -%}
# https://{{ apiserver_loadbalancer_domain_name }}:{{ loadbalancer_apiserver.port|default(kube_apiserver_port) }}
# {%- elif not is_kube_master and loadbalancer_apiserver_localhost -%}
# https://localhost:{{ loadbalancer_apiserver_port|default(kube_apiserver_port) }}
# {%- elif is_kube_master -%}
# https://{{ kube_apiserver_bind_address | regex_replace('0\.0\.0\.0','127.0.0.1') }}:{{ kube_apiserver_port }}
# {%- else -%}
# https://{{ first_kube_control_plane_address }}:{{ kube_apiserver_port }}
# {%- endif %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you converting this to a fact ? Since variable are lazy that should not be necessary.

Comment on lines +31 to +37
- name: create fallback_ips_parsed
set_fact:
fallback_ips_parsed: "{{ hostvars.localhost.fallback_ips_base | from_yaml }}"

- name: set fallback_ips
set_fact:
fallback_ips: "{{ hostvars.localhost.fallback_ips_base | from_yaml }}"
fallback_ips: "{{ fallback_ips_parsed if fallback_ips_parsed is mapping else {} }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not super clear why you need to do this, could you explain it ?

when: inventory_hostname == groups['kube_control_plane'][0]
when: inventory_hostname == groups['kube_control_plane'] | first
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that necessary to make the PR works ? It does not change the semantic right ?

If not I'd drop those, the PR is already big enough ! ^^

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kerryeon
Once this PR has been reviewed and has the lgtm label, please assign cristicalin for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@VannTen
Copy link
Contributor

VannTen commented Jan 19, 2024 via email

@k8s-ci-robot k8s-ci-robot reopened this Jan 19, 2024
@k8s-ci-robot
Copy link
Contributor

@VannTen: Reopened this PR.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 19, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 18, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 18, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
5 participants