Skip to content

更换 master 节点

hzma edited this page May 23, 2022 · 4 revisions

Central 节点下线

以删除节点 gateway 为例:

[root@master ~]# kubectl -n kube-system get pod -o wide | grep central
ovn-central-74b5f7b9c5-4wzkj           1/1     Running   0          25m   192.168.50.128   gateway   <none>           <none>
ovn-central-74b5f7b9c5-76xwg           1/1     Running   0          24m   192.168.50.112   slave     <none>           <none>
ovn-central-74b5f7b9c5-9knkl           1/1     Running   0          23m   192.168.50.134   master    <none>           <none>

1. 检查当前节点在 NB 集群的 id, 并记录当前节点的 id:

[root@master ~]# kubectl -n kube-system get pods -o wide  | grep central
ovn-central-74b5f7b9c5-bztcl           1/1     Running   0               15m     192.168.50.134   master    <none>           <none>
ovn-central-74b5f7b9c5-gwml6           1/1     Running   0               3m57s   192.168.50.128   gateway   <none>           <none>
ovn-central-74b5f7b9c5-p9zfn           1/1     Running   0               15m     192.168.50.112   slave     <none>           <none>
[root@master ~]# kubectl -n kube-system exec -ti ovn-central-74b5f7b9c5-gwml6 bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
root@gateway:/kube-ovn# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
8332
Name: OVN_Northbound
Cluster ID: 276a (276a7bb3-51a9-473e-b2af-78bfe813489d)
Server ID: 8332 (8332545e-8d9e-484a-917b-5c46b866daa3)
Address: tcp:[192.168.50.128]:6643
Status: cluster member
Role: follower
Term: 41
Leader: 5eef
Vote: unknown

Election timer: 5000
Log: [46367, 65727]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->0000 ->68d3 <-5eef <-68d3
Disconnections: 0
Servers:
    68d3 (68d3 at tcp:[192.168.50.134]:6643) last msg 243572 ms ago
    8332 (8332 at tcp:[192.168.50.128]:6643) (self)
    5eef (5eef at tcp:[192.168.50.112]:6643) last msg 179 ms ago

记录该节点的名字为 8332.

可以从 Servers 条目下找到该节点对应的 id. 也可从第一行找到对应 id. 可相互验证.

2. 从 NB 集群中下线该节点

可就近在该节点中执行以下命令将该节点踢出集群

root@gateway:/kube-ovn# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/kick OVN_Northbound 8332
root@gateway:/kube-ovn#
root@gateway:/kube-ovn# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
unknown cluster
ovs-appctl: /var/run/ovn/ovnnb_db.ctl: server returned an error
root@gateway:/kube-ovn# exit

可以看到查看 status 的命令失败了. 此时可以去另一个节点上查看集群状态.

[root@master ~]# kubectl -n kube-system get pods -o wide  | grep central
ovn-central-74b5f7b9c5-bztcl           1/1     Running   0               29m    192.168.50.134   master    <none>           <none>
ovn-central-74b5f7b9c5-gwml6           1/1     Running   0               17m    192.168.50.128   gateway   <none>           <none>
ovn-central-74b5f7b9c5-p9zfn           1/1     Running   0               29m    192.168.50.112   slave     <none>           <none>
[root@master ~]# kubectl -n kube-system exec -ti ovn-central-74b5f7b9c5-bztcl bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
root@master:/kube-ovn# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
68d3
Name: OVN_Northbound
Cluster ID: 276a (276a7bb3-51a9-473e-b2af-78bfe813489d)
Server ID: 68d3 (68d3bfbd-65b1-47fa-a35e-083f394a32e3)
Address: tcp:[192.168.50.134]:6643
Status: cluster member
Role: follower
Term: 41
Leader: 5eef
Vote: 5eef

Last Election started 1780560 ms ago, reason: timeout
Election timer: 5000
Log: [46367, 65728]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->5eef <-5eef
Disconnections: 2
Servers:
    68d3 (68d3 at tcp:[192.168.50.134]:6643) (self)
    5eef (5eef at tcp:[192.168.50.112]:6643) last msg 1404 ms ago
root@master:/kube-ovn#

可以看到踢出集群的操作已经成功.

3. 以相同操作踢出 SB 集群并验证

[root@master ~]# kubectl -n kube-system exec -ti ovn-central-74b5f7b9c5-gwml6 bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
root@gateway:/kube-ovn# ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound
4049
Name: OVN_Southbound
Cluster ID: aa98 (aa985c1d-be8a-46e2-9a22-d3164c0042a7)
Server ID: 4049 (4049c11e-a07e-44ab-8c6b-549bc6172590)
Address: tcp:[192.168.50.128]:6644
Status: cluster member
Role: follower
Term: 44
Leader: d52b
Vote: unknown

Election timer: 5000
Log: [47459, 65157]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->0000 ->7e31 <-d52b <-7e31
Disconnections: 0
Servers:
    d52b (d52b at tcp:[192.168.50.112]:6644) last msg 1406 ms ago
    4049 (4049 at tcp:[192.168.50.128]:6644) (self)
    7e31 (7e31 at tcp:[192.168.50.134]:6644) last msg 1314818 ms ago
root@gateway:/kube-ovn# ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/kick OVN_Southbound 4049
root@gateway:/kube-ovn# exit
exit
[root@master ~]# kubectl -n kube-system exec -ti ovn-central-74b5f7b9c5-bztcl bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
root@master:/kube-ovn# ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound
7e31
Name: OVN_Southbound
Cluster ID: aa98 (aa985c1d-be8a-46e2-9a22-d3164c0042a7)
Server ID: 7e31 (7e31e71e-d6f3-4fa0-8a87-7aaa9d9d6b00)
Address: tcp:[192.168.50.134]:6644
Status: cluster member
Role: follower
Term: 44
Leader: d52b
Vote: d52b

Election timer: 5000
Log: [47459, 65158]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->d52b <-d52b
Disconnections: 2
Servers:
    d52b (d52b at tcp:[192.168.50.112]:6644) last msg 92 ms ago
    7e31 (7e31 at tcp:[192.168.50.134]:6644) (self)

4. 删除本节点上的相应目录

rm -rf /etc/origin/ovn/ /var/run/ovn/ /etc/ovn

​ 监控 ovn 组件是否受到影响. 若受到影响则可删除 pod 使其重启

5. 取消节点 master 标注并缩容 central

kubectl label node gateway kube-ovn/role-
kubectl scale deployment -n kube-system ovn-central --replicas=2
kubectl set env deployment/ovn-central -n kube-system NODE_IPS="192.168.50.134,192.168.50.112"
kubectl rollout status deployment/ovn-central -n kube-system 

Central 节点上线

0. 确认节点下列目录为空(或无此目录)

/etc/origin/ovn/
/var/run/ovn/
/etc/ovn

1. 随便找个 central pod 验证当前集群没有不存在的节点信息

​ 不存在的节点可能会导致在选举 leader 时由于投票数量不足而无法选出 leader. 所以一定要保证节点都有对应.

# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
# ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound

2. 给节点增加标记并扩容

kubectl label node `NODE` kube-ovn/role=master
kubectl scale deployment -n kube-system ovn-central --replicas=3
kubectl set env deployment/ovn-central -n kube-system NODE_IPS="192.168.50.134,192.168.50.112,192.168.50.85"
kubectl rollout status deployment/ovn-central -n kube-system 
Clone this wiki locally