Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix wait for terminated pods or for minion nodes in the just-restarted cluster #11106

Closed
wants to merge 14 commits into from

Conversation

prezha
Copy link
Contributor

@prezha prezha commented Apr 15, 2021

fixes #11101 and #10674

we shouldn't wait for terminated pods or for minion nodes in the just-restarted cluster, and, in general, we should wait after the cni is applied

example logs are given here

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 15, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: prezha
To complete the pull request process, please assign afbjorklund after the PR has been reviewed.
You can assign the PR to them by writing /assign @afbjorklund in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@spowelljr
Copy link
Member

@prezha I tried the PR and still getting the same issue on my end.

% time ./out/minikube start --container-runtime=containerd --wait=all
😄  minikube v1.19.0 on Darwin 11.2.3
✨  Automatically selected the docker driver. Other choices: hyperkit, virtualbox, ssh
👍  Starting control plane node minikube in cluster minikube
🚜  Pulling base image ...
    > gcr.io/k8s-minikube/kicbase...: 357.67 MiB / 357.67 MiB  100.00% 7.03 MiB
🔥  Creating docker container (CPUs=2, Memory=4000MB) ...
📦  Preparing Kubernetes v1.20.2 on containerd 1.4.4 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔗  Configuring CNI (Container Networking Interface) ...
🔎  Verifying Kubernetes components...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: storage-provisioner, default-storageclass
E0415 11:44:17.426813   49963 pod_ready.go:70] WaitExtra: waitPodCondition: timed out waiting for the condition
E0415 11:48:17.464925   49963 pod_ready.go:70] WaitExtra: waitPodCondition: timed out waiting for the condition

❌  Exiting due to GUEST_START: wait 6m0s for node: extra waiting: timed out waiting 6m0s for all system-critical and pods with labels [k8s-app=kube-dns component=etcd component=kube-apiserver component=kube-controller-manager k8s-app=kube-proxy component=kube-scheduler] to be "Ready"

😿  If the above advice does not help, please let us know: 
👉  https://github.com/kubernetes/minikube/issues/new/choose

./out/minikube start --container-runtime=containerd --wait=all  9.75s user 5.66s system 2% cpu 10:20.60 total

@prezha
Copy link
Contributor Author

prezha commented Apr 15, 2021

@spowelljr thanks for trying it!

i can't replicate it locally, can i kindly ask you to append --alsologtostderr -v=9 to your command and attach the complete log output

@spowelljr
Copy link
Member

@prezha Output attached

output.txt

@spowelljr
Copy link
Member

% kubectl get pods -o wide --all-namespaces
NAMESPACE     NAME                               READY   STATUS              RESTARTS   AGE   IP             NODE       NOMINATED NODE   READINESS GATES
kube-system   coredns-74ff55c5b-2j7sr            0/1     ContainerCreating   0          23m   <none>         minikube   <none>           <none>
kube-system   etcd-minikube                      1/1     Running             0          24m   192.168.49.2   minikube   <none>           <none>
kube-system   kindnet-qsvzh                      0/1     CrashLoopBackOff    9          23m   192.168.49.2   minikube   <none>           <none>
kube-system   kube-apiserver-minikube            1/1     Running             0          24m   192.168.49.2   minikube   <none>           <none>
kube-system   kube-controller-manager-minikube   1/1     Running             0          24m   192.168.49.2   minikube   <none>           <none>
kube-system   kube-proxy-gw7n9                   1/1     Running             0          23m   192.168.49.2   minikube   <none>           <none>
kube-system   kube-scheduler-minikube            1/1     Running             0          24m   192.168.49.2   minikube   <none>           <none>
kube-system   storage-provisioner                0/1     CrashLoopBackOff    9          23m   192.168.49.2   minikube   <none>           <none>

@prezha
Copy link
Contributor Author

prezha commented Apr 15, 2021

hmmm, CrashLoopBackOff for kindnet and storage-provisioner - can you pls also share logs of those containers?

@spowelljr
Copy link
Member

Attached

kindnet.txt
storage.txt

@spowelljr
Copy link
Member

Those logs don't really contain anything useful I don't think, my fault.

kindnet

I0415 19:30:57.478767       1 main.go:88] connected to apiserver: https://10.96.0.1:443
I0415 19:30:57.478843       1 main.go:93] hostIP = 192.168.49.2
podIP = 192.168.49.2
I0415 19:30:57.479081       1 main.go:102] setting mtu 1500 for CNI 
I0415 19:30:59.429102       1 main.go:139] Failed to get nodes, retrying after error: Get https://10.96.0.1:443/api/v1/nodes: dial tcp 10.96.0.1:443: connect: connection refused
I0415 19:30:59.478245       1 main.go:139] Failed to get nodes, retrying after error: Get https://10.96.0.1:443/api/v1/nodes: dial tcp 10.96.0.1:443: connect: connection refused
I0415 19:31:00.479404       1 main.go:139] Failed to get nodes, retrying after error: Get https://10.96.0.1:443/api/v1/nodes: dial tcp 10.96.0.1:443: connect: connection refused
I0415 19:31:02.481861       1 main.go:139] Failed to get nodes, retrying after error: Get https://10.96.0.1:443/api/v1/nodes: dial tcp 10.96.0.1:443: connect: connection refused
I0415 19:31:05.484144       1 main.go:139] Failed to get nodes, retrying after error: Get https://10.96.0.1:443/api/v1/nodes: dial tcp 10.96.0.1:443: connect: connection refused
panic: Reached maximum retries obtaining node list: Get https://10.96.0.1:443/api/v1/nodes: dial tcp 10.96.0.1:443: connect: connection refused

goroutine 1 [running]:
main.main()
	/go/src/cmd/kindnetd/main.go:143 +0xc0a

Storage

I0415 19:29:01.340489       1 storage_provisioner.go:116] Initializing the minikube storage provisioner...
F0415 19:29:01.341986       1 main.go:39] error getting server version: Get "https://10.96.0.1:443/version?timeout=32s": dial tcp 10.96.0.1:443: connect: connection refused

@prezha
Copy link
Contributor Author

prezha commented Apr 15, 2021

thanks, @spowelljr, interesting!
seems like a routing/firewall issue, but not sure yet

do you mind if we ok-to-test this pr against other machines - could come up with some more data?

@spowelljr
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Apr 15, 2021
@prezha
Copy link
Contributor Author

prezha commented Apr 15, 2021

it looks like as being related to that specific combination of the driver (docker) and runtime (containerd) - with kvm + containerd combo it looks like it's working:

❯ minikube start -p kvmcontainerdwait --driver=kvm --container-runtime=containerd --wait=true
😄  [kvmcontainerdwait] minikube v1.19.0 on Opensuse-Tumbleweed
✨  Using the kvm2 driver based on user configuration
👍  Starting control plane node kvmcontainerdwait in cluster kvmcontainerdwait
🔥  Creating kvm2 VM (CPUs=2, Memory=6000MB, Disk=20000MB) ...
📦  Preparing Kubernetes v1.20.2 on containerd 1.4.3 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔗  Configuring bridge CNI (Container Networking Interface) ...
🔎  Verifying Kubernetes components...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: default-storageclass, storage-provisioner
❌  Problems detected in kubelet:
    Apr 15 19:57:14 kvmcontainerdwait kubelet[2694]: E0415 19:57:14.781879    2694 reflector.go:138] object-"kube-system"/"coredns": Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps "coredns" is forbidden: User "system:node:kvmcontainerdwait" cannot list resource "configmaps" in API group "" in the namespace "kube-system": no relationship found between node 'kvmcontainerdwait' and this object
    Apr 15 19:57:14 kvmcontainerdwait kubelet[2694]: E0415 19:57:14.781925    2694 reflector.go:138] object-"kube-system"/"kube-proxy": Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps "kube-proxy" is forbidden: User "system:node:kvmcontainerdwait" cannot list resource "configmaps" in API group "" in the namespace "kube-system": no relationship found between node 'kvmcontainerdwait' and this object
    Apr 15 19:57:14 kvmcontainerdwait kubelet[2694]: E0415 19:57:14.782066    2694 reflector.go:138] object-"kube-system"/"kube-proxy-token-ss4lf": Failed to watch *v1.Secret: failed to list *v1.Secret: secrets "kube-proxy-token-ss4lf" is forbidden: User "system:node:kvmcontainerdwait" cannot list resource "secrets" in API group "" in the namespace "kube-system": no relationship found between node 'kvmcontainerdwait' and this object
    Apr 15 19:57:14 kvmcontainerdwait kubelet[2694]: E0415 19:57:14.782156    2694 reflector.go:138] object-"kube-system"/"coredns-token-p9bnw": Failed to watch *v1.Secret: failed to list *v1.Secret: secrets "coredns-token-p9bnw" is forbidden: User "system:node:kvmcontainerdwait" cannot list resource "secrets" in API group "" in the namespace "kube-system": no relationship found between node 'kvmcontainerdwait' and this object
    Apr 15 19:57:17 kvmcontainerdwait kubelet[2694]: E0415 19:57:17.480072    2694 pod_workers.go:191] Error syncing pod 1c5a81fe-970d-401f-b3d8-9efb0b600850 ("coredns-74ff55c5b-9fkd4_kube-system(1c5a81fe-970d-401f-b3d8-9efb0b600850)"), skipping: failed to "StartContainer" for "coredns" with CreateContainerConfigError: "cannot find volume \"config-volume\" to mount into container \"coredns\""
❌  Problems detected in kubelet:
    Apr 15 19:57:14 kvmcontainerdwait kubelet[2694]: E0415 19:57:14.781879    2694 reflector.go:138] object-"kube-system"/"coredns": Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps "coredns" is forbidden: User "system:node:kvmcontainerdwait" cannot list resource "configmaps" in API group "" in the namespace "kube-system": no relationship found between node 'kvmcontainerdwait' and this object
    Apr 15 19:57:14 kvmcontainerdwait kubelet[2694]: E0415 19:57:14.781925    2694 reflector.go:138] object-"kube-system"/"kube-proxy": Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps "kube-proxy" is forbidden: User "system:node:kvmcontainerdwait" cannot list resource "configmaps" in API group "" in the namespace "kube-system": no relationship found between node 'kvmcontainerdwait' and this object
    Apr 15 19:57:14 kvmcontainerdwait kubelet[2694]: E0415 19:57:14.782066    2694 reflector.go:138] object-"kube-system"/"kube-proxy-token-ss4lf": Failed to watch *v1.Secret: failed to list *v1.Secret: secrets "kube-proxy-token-ss4lf" is forbidden: User "system:node:kvmcontainerdwait" cannot list resource "secrets" in API group "" in the namespace "kube-system": no relationship found between node 'kvmcontainerdwait' and this object
    Apr 15 19:57:14 kvmcontainerdwait kubelet[2694]: E0415 19:57:14.782156    2694 reflector.go:138] object-"kube-system"/"coredns-token-p9bnw": Failed to watch *v1.Secret: failed to list *v1.Secret: secrets "coredns-token-p9bnw" is forbidden: User "system:node:kvmcontainerdwait" cannot list resource "secrets" in API group "" in the namespace "kube-system": no relationship found between node 'kvmcontainerdwait' and this object
    Apr 15 19:57:17 kvmcontainerdwait kubelet[2694]: E0415 19:57:17.480072    2694 pod_workers.go:191] Error syncing pod 1c5a81fe-970d-401f-b3d8-9efb0b600850 ("coredns-74ff55c5b-9fkd4_kube-system(1c5a81fe-970d-401f-b3d8-9efb0b600850)"), skipping: failed to "StartContainer" for "coredns" with CreateContainerConfigError: "cannot find volume \"config-volume\" to mount into container \"coredns\""
❌  Problems detected in kubelet:
    Apr 15 19:57:14 kvmcontainerdwait kubelet[2694]: E0415 19:57:14.781879    2694 reflector.go:138] object-"kube-system"/"coredns": Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps "coredns" is forbidden: User "system:node:kvmcontainerdwait" cannot list resource "configmaps" in API group "" in the namespace "kube-system": no relationship found between node 'kvmcontainerdwait' and this object
    Apr 15 19:57:14 kvmcontainerdwait kubelet[2694]: E0415 19:57:14.781925    2694 reflector.go:138] object-"kube-system"/"kube-proxy": Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps "kube-proxy" is forbidden: User "system:node:kvmcontainerdwait" cannot list resource "configmaps" in API group "" in the namespace "kube-system": no relationship found between node 'kvmcontainerdwait' and this object
    Apr 15 19:57:14 kvmcontainerdwait kubelet[2694]: E0415 19:57:14.782066    2694 reflector.go:138] object-"kube-system"/"kube-proxy-token-ss4lf": Failed to watch *v1.Secret: failed to list *v1.Secret: secrets "kube-proxy-token-ss4lf" is forbidden: User "system:node:kvmcontainerdwait" cannot list resource "secrets" in API group "" in the namespace "kube-system": no relationship found between node 'kvmcontainerdwait' and this object
    Apr 15 19:57:14 kvmcontainerdwait kubelet[2694]: E0415 19:57:14.782156    2694 reflector.go:138] object-"kube-system"/"coredns-token-p9bnw": Failed to watch *v1.Secret: failed to list *v1.Secret: secrets "coredns-token-p9bnw" is forbidden: User "system:node:kvmcontainerdwait" cannot list resource "secrets" in API group "" in the namespace "kube-system": no relationship found between node 'kvmcontainerdwait' and this object
    Apr 15 19:57:17 kvmcontainerdwait kubelet[2694]: E0415 19:57:17.480072    2694 pod_workers.go:191] Error syncing pod 1c5a81fe-970d-401f-b3d8-9efb0b600850 ("coredns-74ff55c5b-9fkd4_kube-system(1c5a81fe-970d-401f-b3d8-9efb0b600850)"), skipping: failed to "StartContainer" for "coredns" with CreateContainerConfigError: "cannot find volume \"config-volume\" to mount into container \"coredns\""
🏄  Done! kubectl is now configured to use "kvmcontainerdwait" cluster and "default" namespace by default
❯ kubectl get pods -o wide --all-namespaces

NAMESPACE     NAME                                        READY   STATUS    RESTARTS   AGE     IP              NODE                NOMINATED NODE   READINESS GATES
kube-system   coredns-74ff55c5b-pqmdf                     1/1     Running   0          5m35s   10.244.0.3      kvmcontainerdwait   <none>           <none>
kube-system   etcd-kvmcontainerdwait                      1/1     Running   0          5m39s   192.168.39.58   kvmcontainerdwait   <none>           <none>
kube-system   kube-apiserver-kvmcontainerdwait            1/1     Running   0          5m39s   192.168.39.58   kvmcontainerdwait   <none>           <none>
kube-system   kube-controller-manager-kvmcontainerdwait   1/1     Running   0          5m39s   192.168.39.58   kvmcontainerdwait   <none>           <none>
kube-system   kube-proxy-w6k6l                            1/1     Running   0          5m35s   192.168.39.58   kvmcontainerdwait   <none>           <none>
kube-system   kube-scheduler-kvmcontainerdwait            1/1     Running   0          5m39s   192.168.39.58   kvmcontainerdwait   <none>           <none>
kube-system   storage-provisioner                         1/1     Running   1          5m34s   192.168.39.58   kvmcontainerdwait   <none>           <none>

btw, Problems detected in kubelet here i think are not relevant (referring to coredns-74ff55c5b-9fkd4 that was replaced) and at times they are not shown at all - eg, 2nd run gives:

❯ minikube start -p kvmcontainerdwait --driver=kvm --container-runtime=containerd --wait=true
😄  [kvmcontainerdwait] minikube v1.19.0 on Opensuse-Tumbleweed
✨  Using the kvm2 driver based on user configuration
👍  Starting control plane node kvmcontainerdwait in cluster kvmcontainerdwait
🔥  Creating kvm2 VM (CPUs=2, Memory=6000MB, Disk=20000MB) ...
📦  Preparing Kubernetes v1.20.2 on containerd 1.4.3 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔗  Configuring bridge CNI (Container Networking Interface) ...
🔎  Verifying Kubernetes components...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: storage-provisioner, default-storageclass
🏄  Done! kubectl is now configured to use "kvmcontainerdwait" cluster and "default" namespace by default
❯ kubectl get pods -o wide --all-namespaces                                                                  codex.triplepoint.tech: Thu Apr 15 21:07:09 2021

NAMESPACE     NAME                                        READY   STATUS    RESTARTS   AGE     IP               NODE                NOMINATED NODE   READINESS GATES
kube-system   coredns-74ff55c5b-h9r5l                     1/1     Running   0          2m25s   10.244.0.2       kvmcontainerdwait   <none>           <none>
kube-system   etcd-kvmcontainerdwait                      1/1     Running   0          2m29s   192.168.39.122   kvmcontainerdwait   <none>           <none>
kube-system   kube-apiserver-kvmcontainerdwait            1/1     Running   0          2m29s   192.168.39.122   kvmcontainerdwait   <none>           <none>
kube-system   kube-controller-manager-kvmcontainerdwait   1/1     Running   0          2m29s   192.168.39.122   kvmcontainerdwait   <none>           <none>
kube-system   kube-proxy-jpf4r                            1/1     Running   0          2m25s   192.168.39.122   kvmcontainerdwait   <none>           <none>
kube-system   kube-scheduler-kvmcontainerdwait            1/1     Running   0          2m29s   192.168.39.122   kvmcontainerdwait   <none>           <none>
kube-system   storage-provisioner                         1/1     Running   0          2m22s   192.168.39.122   kvmcontainerdwait   <none>           <none>

i can't run docker combo on my machine bc it gets the mix of cgroup managers (cgroupfs and systemd) that i cannot overcome atm

@medyagh
Copy link
Member

medyagh commented Apr 15, 2021

it looks like as being related to that specific combination of the driver (docker) and runtime (containerd) - with kvm + containerd combo it looks like it's working:

that could be something about kindnet, maybe try bumping the kindnet and see if that fixes?

and also we need to check the CNI folder used on ISO and vm driver use bridge cni

@minikube-pr-bot
Copy link

kvm2 driver with docker runtime

+-------------------+----------+---------------------+
|      COMMAND      | MINIKUBE | MINIKUBE (PR 11106) |
+-------------------+----------+---------------------+
| minikube start    | 55.7s    | 54.4s               |
| ⚠️  enable ingress | 37.6s    | 43.7s ⚠️             |
+-------------------+----------+---------------------+

Times for minikube start: 56.1s 54.9s 51.9s 58.2s 57.5s
Times for minikube (PR 11106) start: 51.7s 52.6s 53.2s 56.8s 57.6s

Times for minikube (PR 11106) ingress: 43.4s 43.9s 44.1s 43.5s 43.5s
Times for minikube ingress: 43.4s 36.4s 35.5s 36.9s 35.6s

docker driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 11106) |
+----------------+----------+---------------------+
| minikube start | 23.3s    | 22.6s               |
| enable ingress | 33.8s    | 32.9s               |
+----------------+----------+---------------------+

Times for minikube start: 25.4s 23.0s 23.6s 22.3s 22.3s
Times for minikube (PR 11106) start: 23.3s 23.2s 22.5s 22.1s 22.1s

Times for minikube ingress: 31.0s 34.5s 34.0s 33.6s 36.1s
Times for minikube (PR 11106) ingress: 33.6s 33.1s 29.0s 32.6s 36.0s

docker driver with containerd runtime
error collecting results for docker driver: timing run 0 with minikube: timing cmd: [out/minikube addons enable ingress]: waiting for minikube: exit status 10

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 15, 2021
@minikube-pr-bot
Copy link

kvm2 driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 11106) |
+----------------+----------+---------------------+
| minikube start | 52.0s    | 49.8s               |
| enable ingress | 34.4s    | 35.5s               |
+----------------+----------+---------------------+

Times for minikube start: 56.6s 50.8s 54.0s 51.9s 46.7s
Times for minikube (PR 11106) start: 48.0s 48.6s 48.8s 50.1s 53.4s

Times for minikube ingress: 33.8s 35.3s 34.4s 34.3s 34.3s
Times for minikube (PR 11106) ingress: 35.2s 37.7s 34.3s 34.2s 36.2s

docker driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 11106) |
+----------------+----------+---------------------+
| minikube start | 21.8s    | 21.7s               |
| enable ingress | 30.6s    | 30.1s               |
+----------------+----------+---------------------+

Times for minikube start: 22.6s 21.2s 22.6s 21.2s 21.6s
Times for minikube (PR 11106) start: 20.9s 22.2s 22.4s 20.7s 22.5s

Times for minikube ingress: 28.5s 28.0s 28.5s 36.5s 31.5s
Times for minikube (PR 11106) ingress: 29.0s 31.5s 30.0s 31.0s 29.0s

docker driver with containerd runtime
error collecting results for docker driver: timing run 0 with minikube: timing cmd: [out/minikube addons enable ingress]: waiting for minikube: exit status 10

@prezha prezha force-pushed the fix-wait-for-terminated-pods branch 2 times, most recently from e6896f8 to b4ff382 Compare April 16, 2021 00:04
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 16, 2021
@minikube-pr-bot
Copy link

kvm2 driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 11106) |
+----------------+----------+---------------------+
| minikube start | 48.0s    | 49.3s               |
| enable ingress | 39.5s    | 41.8s               |
+----------------+----------+---------------------+

Times for minikube start: 49.0s 47.4s 48.6s 47.4s 47.6s
Times for minikube (PR 11106) start: 50.9s 49.0s 47.0s 48.1s 51.6s

Times for minikube ingress: 34.8s 43.8s 36.3s 39.2s 43.4s
Times for minikube (PR 11106) ingress: 42.7s 44.4s 42.8s 43.2s 35.8s

docker driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 11106) |
+----------------+----------+---------------------+
| minikube start | 22.0s    | 21.7s               |
| enable ingress | 34.2s    | 36.3s               |
+----------------+----------+---------------------+

Times for minikube start: 23.8s 21.1s 21.8s 21.6s 21.4s
Times for minikube (PR 11106) start: 21.8s 22.0s 21.7s 21.6s 21.5s

Times for minikube ingress: 30.5s 32.0s 36.0s 34.5s 38.1s
Times for minikube (PR 11106) ingress: 34.0s 36.5s 36.0s 41.5s 33.5s

docker driver with containerd runtime
error collecting results for docker driver: timing run 0 with minikube: timing cmd: [out/minikube addons enable ingress]: waiting for minikube: exit status 10

@minikube-pr-bot
Copy link

kvm2 driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 11106) |
+----------------+----------+---------------------+
| minikube start | 49.5s    | 51.1s               |
| enable ingress | 42.4s    | 37.3s               |
+----------------+----------+---------------------+

Times for minikube ingress: 42.9s 43.4s 43.7s 37.8s 44.4s
Times for minikube (PR 11106) ingress: 35.3s 36.2s 43.8s 35.8s 35.3s

Times for minikube start: 52.6s 47.4s 48.3s 51.3s 47.8s
Times for minikube (PR 11106) start: 47.7s 49.3s 55.2s 50.2s 53.2s

docker driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 11106) |
+----------------+----------+---------------------+
| minikube start | 22.2s    | 21.7s               |
| enable ingress | 33.7s    | 35.9s               |
+----------------+----------+---------------------+

Times for minikube (PR 11106) start: 21.9s 20.9s 21.8s 22.0s 22.1s
Times for minikube start: 24.0s 22.1s 21.6s 21.7s 21.6s

Times for minikube ingress: 32.5s 34.5s 35.0s 34.0s 32.5s
Times for minikube (PR 11106) ingress: 38.5s 34.5s 38.0s 33.5s 35.0s

docker driver with containerd runtime
error collecting results for docker driver: timing run 0 with minikube: timing cmd: [out/minikube addons enable ingress]: waiting for minikube: exit status 10

@minikube-pr-bot
Copy link

kvm2 driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 11106) |
+----------------+----------+---------------------+
| minikube start | 50.2s    | 49.3s               |
| enable ingress | 34.6s    | 35.4s               |
+----------------+----------+---------------------+

Times for minikube start: 50.2s 47.5s 51.7s 55.0s 46.8s
Times for minikube (PR 11106) start: 51.2s 46.4s 49.5s 48.9s 50.5s

Times for minikube ingress: 35.3s 35.2s 34.3s 33.8s 34.2s
Times for minikube (PR 11106) ingress: 35.3s 35.8s 35.3s 34.8s 35.6s

docker driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 11106) |
+----------------+----------+---------------------+
| minikube start | 21.4s    | 21.2s               |
| enable ingress | 31.3s    | 31.7s               |
+----------------+----------+---------------------+

Times for minikube start: 22.0s 21.0s 21.1s 21.8s 20.9s
Times for minikube (PR 11106) start: 21.1s 20.6s 21.3s 21.2s 21.7s

Times for minikube ingress: 30.5s 29.0s 36.5s 29.5s 31.0s
Times for minikube (PR 11106) ingress: 30.5s 30.0s 37.0s 28.5s 32.5s

docker driver with containerd runtime
error collecting results for docker driver: timing run 0 with minikube: timing cmd: [out/minikube addons enable ingress]: waiting for minikube: exit status 10

@prezha
Copy link
Contributor Author

prezha commented Apr 23, 2021

i think i've split it all correctly and cleaned up this pr now...

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 24, 2021
@prezha prezha force-pushed the fix-wait-for-terminated-pods branch from f3573eb to 89c5ec2 Compare April 24, 2021 01:09
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 24, 2021
@prezha prezha force-pushed the fix-wait-for-terminated-pods branch from 3873b8a to 357b593 Compare April 24, 2021 01:18
@minikube-pr-bot
Copy link

kvm2 driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 11106) |
+----------------+----------+---------------------+
| minikube start | 49.7s    | 49.7s               |
| enable ingress | 35.6s    | 36.7s               |
+----------------+----------+---------------------+

Times for minikube start: 50.1s 50.5s 46.6s 51.1s 49.9s
Times for minikube (PR 11106) start: 50.3s 49.6s 54.3s 47.6s 46.8s

Times for minikube ingress: 34.8s 36.7s 36.7s 35.4s 34.3s
Times for minikube (PR 11106) ingress: 35.7s 34.2s 35.4s 35.7s 42.7s

docker driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 11106) |
+----------------+----------+---------------------+
| minikube start | 21.7s    | 21.7s               |
| enable ingress | 28.3s    | 30.0s               |
+----------------+----------+---------------------+

Times for minikube ingress: 29.5s 29.0s 28.5s 27.0s 27.5s
Times for minikube (PR 11106) ingress: 30.5s 29.0s 29.0s 29.0s 32.5s

Times for minikube start: 21.3s 22.7s 21.3s 21.8s 21.6s
Times for minikube (PR 11106) start: 22.1s 21.8s 21.4s 22.0s 21.1s

docker driver with containerd runtime
error collecting results for docker driver: timing run 0 with minikube: timing cmd: [out/minikube addons enable ingress]: waiting for minikube: exit status 10

@minikube-pr-bot
Copy link

kvm2 driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 11106) |
+----------------+----------+---------------------+
| minikube start | 47.5s    | 51.6s               |
| enable ingress | 37.7s    | 39.5s               |
+----------------+----------+---------------------+

Times for minikube start: 49.3s 46.7s 47.5s 47.0s 46.9s
Times for minikube (PR 11106) start: 47.2s 51.5s 54.4s 54.6s 50.3s

Times for minikube ingress: 37.8s 34.3s 42.8s 36.2s 37.7s
Times for minikube (PR 11106) ingress: 37.4s 36.7s 34.8s 45.3s 43.3s

docker driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 11106) |
+----------------+----------+---------------------+
| minikube start | 22.0s    | 21.0s               |
| enable ingress | 35.2s    | 37.3s               |
+----------------+----------+---------------------+

Times for minikube start: 22.5s 22.5s 21.3s 21.7s 22.0s
Times for minikube (PR 11106) start: 21.2s 20.9s 20.9s 21.0s 21.1s

Times for minikube (PR 11106) ingress: 35.5s 38.0s 38.0s 40.5s 34.5s
Times for minikube ingress: 33.5s 33.0s 34.0s 34.5s 41.0s

docker driver with containerd runtime
error collecting results for docker driver: timing run 0 with minikube: timing cmd: [out/minikube addons enable ingress]: waiting for minikube: exit status 10

@minikube-pr-bot
Copy link

kvm2 driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 11106) |
+----------------+----------+---------------------+
| minikube start | 50.8s    | 49.2s               |
| enable ingress | 35.4s    | 35.8s               |
+----------------+----------+---------------------+

Times for minikube start: 49.3s 50.7s 50.0s 50.2s 53.9s
Times for minikube (PR 11106) start: 47.5s 47.4s 45.7s 54.6s 50.8s

Times for minikube ingress: 36.4s 33.8s 34.8s 37.7s 34.4s
Times for minikube (PR 11106) ingress: 35.8s 34.3s 35.8s 38.2s 34.7s

docker driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 11106) |
+----------------+----------+---------------------+
| minikube start | 21.6s    | 21.9s               |
| enable ingress | 30.3s    | 31.1s               |
+----------------+----------+---------------------+

Times for minikube start: 20.9s 21.3s 21.8s 20.9s 23.1s
Times for minikube (PR 11106) start: 21.8s 22.4s 21.3s 21.2s 22.6s

Times for minikube (PR 11106) ingress: 32.5s 29.0s 29.0s 33.5s 31.5s
Times for minikube ingress: 29.0s 27.5s 32.5s 29.0s 33.5s

docker driver with containerd runtime
error collecting results for docker driver: timing run 0 with minikube: timing cmd: [out/minikube addons enable ingress]: waiting for minikube: exit status 10

@medyagh
Copy link
Member

medyagh commented Apr 25, 2021

@prezha in none the PRs (and broke up PRs) the containerd test finishes ....

@medyagh
Copy link
Member

medyagh commented Apr 25, 2021

@prezha here is the logs for the Timed out linux containerd Job
log1.txt

@prezha
Copy link
Contributor Author

prezha commented Apr 25, 2021

@prezha here is the logs for the Timed out linux containerd Job
log1.txt

@medyagh yes, i think this is the correct 'wait' behaviour because critical pods didn't reach the 'running' phase or 'ready' condition - extracts from the log:

...
I0424 03:29:25.654598  595690 kapi.go:96] waiting for pod "kubernetes.io/minikube-addons=csi-hostpath-driver", current state: Pending: [<nil>]
I0424 03:29:25.964728  595690 kapi.go:96] waiting for pod "kubernetes.io/minikube-addons=registry", current state: Pending: [<nil>]
I0424 03:29:25.973321  595690 kapi.go:96] waiting for pod "app.kubernetes.io/name=ingress-nginx", current state: Pending: [<nil>]
I0424 03:29:26.154318  595690 kapi.go:96] waiting for pod "kubernetes.io/minikube-addons=csi-hostpath-driver", current state: Pending: [<nil>]
I0424 03:29:26.464825  595690 kapi.go:96] waiting for pod "kubernetes.io/minikube-addons=registry", current state: Pending: [<nil>]
I0424 03:29:26.473615  595690 kapi.go:96] waiting for pod "app.kubernetes.io/name=ingress-nginx", current state: Pending: [<nil>]
I0424 03:29:26.654390  595690 kapi.go:96] waiting for pod "kubernetes.io/minikube-addons=csi-hostpath-driver", current state: Pending: [<nil>]
I0424 03:29:26.964793  595690 kapi.go:96] waiting for pod "kubernetes.io/minikube-addons=registry", current state: Pending: [<nil>]
I0424 03:29:26.968010  595690 kapi.go:96] waiting for pod "kubernetes.io/minikube-addons=registry", current state: Pending: [<nil>]
I0424 03:29:26.968041  595690 kapi.go:108] duration metric: took 6m0.011791107s to wait for kubernetes.io/minikube-addons=registry ...
W0424 03:29:26.968440  595690 out.go:222] ! Enabling 'registry' returned an error: running callbacks: [waiting for kubernetes.io/minikube-addons=registry pods: timed out waiting for the condition]
! Enabling 'registry' returned an error: running callbacks: [waiting for kubernetes.io/minikube-addons=registry pods: timed out waiting for the condition]
I0424 03:29:26.973419  595690 kapi.go:96] waiting for pod "app.kubernetes.io/name=ingress-nginx", current state: Pending: [<nil>]
I0424 03:29:26.975863  595690 kapi.go:96] waiting for pod "app.kubernetes.io/name=ingress-nginx", current state: Pending: [<nil>]
I0424 03:29:26.975886  595690 kapi.go:108] duration metric: took 6m0.017436129s to wait for app.kubernetes.io/name=ingress-nginx ...
W0424 03:29:26.976003  595690 out.go:222] ! Enabling 'ingress' returned an error: running callbacks: [waiting for app.kubernetes.io/name=ingress-nginx pods: timed out waiting for the condition]
! Enabling 'ingress' returned an error: running callbacks: [waiting for app.kubernetes.io/name=ingress-nginx pods: timed out waiting for the condition]
I0424 03:29:27.155389  595690 kapi.go:96] waiting for pod "kubernetes.io/minikube-addons=csi-hostpath-driver", current state: Pending: [<nil>]
I0424 03:29:27.518936  595690 pod_ready.go:111] pod "coredns-74ff55c5b-99hj8" in "kube-system" namespace has status "Ready":"False"
I0424 03:29:27.653983  595690 kapi.go:96] waiting for pod "kubernetes.io/minikube-addons=csi-hostpath-driver", current state: Pending: [<nil>]
I0424 03:29:28.155172  595690 kapi.go:96] waiting for pod "kubernetes.io/minikube-addons=csi-hostpath-driver", current state: Pending: [<nil>]
I0424 03:29:28.654102  595690 kapi.go:96] waiting for pod "kubernetes.io/minikube-addons=csi-hostpath-driver", current state: Pending: [<nil>]
I0424 03:29:28.657907  595690 kapi.go:96] waiting for pod "kubernetes.io/minikube-addons=csi-hostpath-driver", current state: Pending: [<nil>]
I0424 03:29:28.657932  595690 kapi.go:108] duration metric: took 6m0.088042704s to wait for kubernetes.io/minikube-addons=csi-hostpath-driver ...
W0424 03:29:28.658044  595690 out.go:222] ! Enabling 'csi-hostpath-driver' returned an error: running callbacks: [waiting for kubernetes.io/minikube-addons=csi-hostpath-driver pods: timed out waiting for the condition]
! Enabling 'csi-hostpath-driver' returned an error: running callbacks: [waiting for kubernetes.io/minikube-addons=csi-hostpath-driver pods: timed out waiting for the condition]
I0424 03:29:28.660816  595690 out.go:157] * Enabled addons: storage-provisioner, metrics-server, helm-tiller, default-storageclass, olm, volumesnapshots
I0424 03:29:28.660846  595690 addons.go:330] enableAddons completed in 6m6.260315577s
I0424 03:29:30.017362  595690 pod_ready.go:111] pod "coredns-74ff55c5b-99hj8" in "kube-system" namespace has status "Ready":"False"
...
I0424 03:31:22.018448  595690 pod_ready.go:111] pod "coredns-74ff55c5b-99hj8" in "kube-system" namespace has status "Ready":"False"
I0424 03:31:22.520922  595690 pod_ready.go:91] duration metric: took 4m0.011620462s waiting for pod "coredns-74ff55c5b-99hj8" in "kube-system" namespace to be "Ready" ...
E0424 03:31:22.520958  595690 pod_ready.go:74] WaitExtra: waitPodCondition: timed out waiting for the condition
I0424 03:31:22.520977  595690 pod_ready.go:40] duration metric: took 8m0.064857624s for extra waiting for all system-critical and pods with labels [k8s-app=kube-dns component=etcd component=kube-apiserver component=kube-controller-manager k8s-app=kube-proxy component=kube-scheduler] to be "Ready" ...
I0424 03:31:22.524585  595690 out.go:157] 
W0424 03:31:22.524736  595690 out.go:222] X Exiting due to GUEST_START: wait 6m0s for node: extra waiting: timed out waiting 6m0s for all system-critical and pods with labels [k8s-app=kube-dns component=etcd component=kube-apiserver component=kube-controller-manager k8s-app=kube-proxy component=kube-scheduler] to be "Ready"
...

and

...
I0424 03:42:45.314419  647853 pod_ready.go:111] pod "coredns-74ff55c5b-6r68h" in "kube-system" namespace has status "Ready":"False"
I0424 03:42:47.814689  647853 pod_ready.go:111] pod "coredns-74ff55c5b-6r68h" in "kube-system" namespace has status "Ready":"False"
I0424 03:42:50.314465  647853 pod_ready.go:111] pod "coredns-74ff55c5b-6r68h" in "kube-system" namespace has status "Ready":"False"
I0424 03:42:51.318372  647853 pod_ready.go:91] duration metric: took 4m0.013478464s waiting for pod "coredns-74ff55c5b-6r68h" in "kube-system" namespace to be "Ready" ...
E0424 03:42:51.318399  647853 pod_ready.go:74] WaitExtra: waitPodCondition: timed out waiting for the condition
I0424 03:42:51.318429  647853 pod_ready.go:40] duration metric: took 8m0.046603643s for extra waiting for all system-critical and pods with labels [k8s-app=kube-dns component=etcd component=kube-apiserver component=kube-controller-manager k8s-app=kube-proxy component=kube-scheduler] to be "Ready" ...
I0424 03:42:51.322092  647853 out.go:157] 
W0424 03:42:51.322341  647853 out.go:222] X Exiting due to GUEST_START: wait 6m0s for node: extra waiting: timed out waiting 6m0s for all system-critical and pods with labels [k8s-app=kube-dns component=etcd component=kube-apiserver component=kube-controller-manager k8s-app=kube-proxy component=kube-scheduler] to be "Ready"
...
==> container status <==
CONTAINER           IMAGE               CREATED             STATE               NAME                      ATTEMPT             POD ID
9698ef0f50e63       6e38f40d628db       8 minutes ago       Running             storage-provisioner       0                   e3f8f095a8e25
2339d1d2eb51c       2b60427ffa5fe       8 minutes ago       Running             kindnet-cni               0                   1c99cdfefd707
f92a24e28d248       43154ddb57a83       8 minutes ago       Running             kube-proxy                0                   014fabd874c92
3c8c7cba55b8b       a27166429d98e       8 minutes ago       Running             kube-controller-manager   0                   1e04e77555834
483fae65af354       0369cf4303ffd       8 minutes ago       Running             etcd                      0                   0526fce7b5e28
a31fd0926602d       ed2c44fbdd78b       8 minutes ago       Running             kube-scheduler            0                   41f6dea4363c0
b3f714388124f       a8c2fdb8bf76e       8 minutes ago       Running             kube-apiserver            0                   f70cbe99101b7
...
==> kubelet <==
-- Logs begin at Sat 2021-04-24 03:33:59 UTC, end at Sat 2021-04-24 03:42:54 UTC. --
Apr 24 03:40:33 functional-20210424033357-594870 kubelet[1121]: E0424 03:40:33.325693    1121 pod_workers.go:191] Error syncing pod 4267d95b-2639-49df-b30b-e01ca573db64 ("coredns-74ff55c5b-6r68h_kube-system(4267d95b-2639-49df-b30b-e01ca573db64)"), skipping: failed to "CreatePodSandbox" for "coredns-74ff55c5b-6r68h_kube-system(4267d95b-2639-49df-b30b-e01ca573db64)" with CreatePodSandboxError: "CreatePodSandbox for pod \"coredns-74ff55c5b-6r68h_kube-system(4267d95b-2639-49df-b30b-e01ca573db64)\" failed: rp"
Apr 24 03:40:57 functional-20210424033357-594870 kubelet[1121]: E0424 03:40:57.297487    1121 remote_runtime.go:116] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to setup network for sandbox "a24b386b0a5e7c26f064cfc9a689201c96daafbaa904a6045ad4101fd8594bef": failed to set bridge addr: could not add IP address to "cni0": permission denied
...

now, the difference is that (successful) 'none' uses no cni and this (failed) 'docker + containerd' uses kindnet - here, the (now separated) #11178 and #11185, should help in addressing the issue itself

so, in the context of 'wait', i think this is the expected and desired behaviour

@medyagh
Copy link
Member

medyagh commented Apr 25, 2021

This change makes the Docker_liniux failure high

Docker_Linux — Jenkins: completed with 8 / 233 failures in 46.28 minute(s).

@@ -63,7 +66,14 @@ func WaitExtra(cs *kubernetes.Clientset, labels []string, timeout time.Duration)
}
if match || pod.Spec.PriorityClassName == "system-cluster-critical" || pod.Spec.PriorityClassName == "system-node-critical" {
if err := waitPodCondition(cs, pod.Name, pod.Namespace, core.PodReady, timeout); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about instead of waiting till it fails and then check the error message to see if it is a "Termianted pod" how about at beggning only range through pods which are not Terminated ?

	if err != nil {
		return fmt.Errorf("error listing pods in %q namespace: %w", meta.NamespaceSystem, err)
	}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or as @ilya-zuyev noted for _, pod := range pods.Items { might have a condition we could check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think something like that (ie, 'skipping!') was my original intention:

if pod.Status.Phase != core.PodRunning && pod.Status.Phase != core.PodPending {
return core.ConditionUnknown, fmt.Sprintf("pod %q in %q namespace has status phase %q (skipping!): %+v", pod.Name, pod.Namespace, pod.Status.Phase, pod.Status)
}

but it somehow got lost along the way (when i also added host checking)

i'll have a look...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@medyagh @ilya-zuyev thanks for catching this!
it shouldn't have waited on non-running and non-pending pods because of:

if pod.Status.Phase != core.PodRunning && pod.Status.Phase != core.PodPending {
return core.ConditionUnknown, fmt.Sprintf("pod %q in %q namespace has status phase %q (skipping!): %+v", pod.Name, pod.Namespace, pod.Status.Phase, pod.Status)
}

and
if status == core.ConditionUnknown {
klog.Info(reason)
return false, fmt.Errorf(reason)
}

then
if err := wait.PollImmediate(kconst.APICallRetryInterval, kconst.DefaultControlPlaneTimeout, checkCondition); err != nil {
return fmt.Errorf("waitPodCondition: %w", err)
}

would exit immediately (due to the returned error from checkCondition()), but i've missed that specific one in:
terminatedPod := strings.HasSuffix(err.Error(), "not found")
if !unstartedMinion.MatchString(err.Error()) && !terminatedPod {

so i've corrected that in 6779c72 in #11209 to test it there first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilya-zuyev @medyagh actually, this was ok - no need to change anything here as wait works as intended (it will skip if pod is gone or if the just-restarted-minion node is not yet ready (detailed explanation above still holds)
i'll close this one in favour of #11209
/close

@k8s-ci-robot
Copy link
Contributor

@prezha: Closed this PR.

In response to this:

@ilya-zuyev @medyagh actually, this was ok - no need to change anything here as wait works as intended (it will skip if pod is gone or if the just-restarted-minion node is not yet ready (detailed explanation above still holds)
i'll close this one in favour of #11209
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

minikube failing to start with containerd and --wait=all flag
6 participants