Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minikube start brings coredns pod online before CNI initializes, breaking DNS for 2+ nodes #11608

Open
cwilkers opened this issue Jun 8, 2021 · 15 comments
Labels
area/cni CNI support area/dns DNS issues co/multinode Issues related to multinode clusters kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@cwilkers
Copy link

cwilkers commented Jun 8, 2021

Steps to reproduce the issue:

  1. minikube start --cni=flannel --nodes=2
  2. kubectl get po -A -o wide
  3. Observe IP of coredns-x-y pod
  4. Bring up pod on node minikube-m02 and attempt to query any DNS address.

In my case, I am trying to install the kubevirt addon, which creates a pod kubevirt-install-manager in the kube-system namespace, which usually schedules on the second node. This pod attempts to download deployment YAML from the kubevirt project, which fails when DNS is unreachable.

Full output of minikube logs command:
logs.txt

Full output of failed command:

kubectl get po -A -o wide

NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-74ff55c5b-gct25 1/1 Running 0 3m19s 10.88.0.2 minikube
kube-system etcd-minikube 1/1 Running 0 3m27s 192.168.39.118 minikube
kube-system kube-apiserver-minikube 1/1 Running 0 3m27s 192.168.39.118 minikube
kube-system kube-controller-manager-minikube 1/1 Running 0 3m27s 192.168.39.118 minikube
kube-system kube-flannel-ds-amd64-96tgf 1/1 Running 0 2m50s 192.168.39.182 minikube-m02
kube-system kube-flannel-ds-amd64-sqtbg 1/1 Running 0 3m18s 192.168.39.118 minikube
kube-system kube-proxy-fcfnx 1/1 Running 0 3m19s 192.168.39.118 minikube
kube-system kube-proxy-mzwjj 1/1 Running 0 2m50s 192.168.39.182 minikube-m02
kube-system kube-scheduler-minikube 1/1 Running 0 3m27s 192.168.39.118 minikube
kube-system kubevirt-install-manager 1/1 Running 0 26s 10.244.1.2 minikube-m02
kube-system storage-provisioner 1/1 Running 1 3m32s 192.168.39.118 minikube

@spowelljr spowelljr added area/cni CNI support area/dns DNS issues co/multinode Issues related to multinode clusters kind/support Categorizes issue or PR as a support question. labels Jun 15, 2021
@sharifelgamal
Copy link
Collaborator

@cwilkers we've done some work to fix networking for multinode clusters, can you try again this with the newest version of minikube and see if it still persists?

@cwilkers
Copy link
Author

Unfortunately, I see the same behavior with v1.22.0.

$ minikube version
minikube version: v1.22.0
commit: a03fbcf166e6f74ef224d4a63be4277d017bb62e
$ minikube start --cni=flannel --nodes=2
😄  minikube v1.22.0 on Fedora 34
✨  Automatically selected the kvm2 driver. Other choices: podman, none, ssh
💾  Downloading driver docker-machine-driver-kvm2:
    > docker-machine-driver-kvm2....: 65 B / 65 B [----------] 100.00% ? p/s 0s
    > docker-machine-driver-kvm2: 11.47 MiB / 11.47 MiB  100.00% 343.09 MiB p/s
💿  Downloading VM boot image ...
    > minikube-v1.22.0.iso.sha256: 65 B / 65 B [-------------] 100.00% ? p/s 0s
    > minikube-v1.22.0.iso: 242.95 MiB / 242.95 MiB  100.00% 98.80 MiB p/s 2.7s
👍  Starting control plane node minikube in cluster minikube
💾  Downloading Kubernetes v1.21.2 preload ...
    > preloaded-images-k8s-v11-v1...: 502.14 MiB / 502.14 MiB  100.00% 84.82 Mi
🔥  Creating kvm2 VM (CPUs=2, Memory=2200MB, Disk=20000MB) ...
🐳  Preparing Kubernetes v1.21.2 on Docker 20.10.6 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔗  Configuring Flannel (Container Networking Interface) ...
🔎  Verifying Kubernetes components...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: storage-provisioner, default-storageclass

👍  Starting node minikube-m02 in cluster minikube
🔥  Creating kvm2 VM (CPUs=2, Memory=2200MB, Disk=20000MB) ...
🌐  Found network options:
    ▪ NO_PROXY=192.168.39.50
🐳  Preparing Kubernetes v1.21.2 on Docker 20.10.6 ...
    ▪ env NO_PROXY=192.168.39.50
🔎  Verifying Kubernetes components...
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
$ kubectl get po -A -o wide
NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE   IP               NODE           NOMINATED NODE   READINESS GATES
kube-system   coredns-558bd4d5db-hvnqx           1/1     Running   0          60s   10.88.0.2        minikube       <none>           <none>
kube-system   etcd-minikube                      1/1     Running   0          67s   192.168.39.50    minikube       <none>           <none>
kube-system   kube-apiserver-minikube            1/1     Running   0          67s   192.168.39.50    minikube       <none>           <none>
kube-system   kube-controller-manager-minikube   1/1     Running   0          67s   192.168.39.50    minikube       <none>           <none>
kube-system   kube-flannel-ds-amd64-4sbx9        1/1     Running   0          61s   192.168.39.50    minikube       <none>           <none>
kube-system   kube-flannel-ds-amd64-h7h58        1/1     Running   0          33s   192.168.39.236   minikube-m02   <none>           <none>
kube-system   kube-proxy-698zf                   1/1     Running   0          33s   192.168.39.236   minikube-m02   <none>           <none>
kube-system   kube-proxy-z2r66                   1/1     Running   0          61s   192.168.39.50    minikube       <none>           <none>
kube-system   kube-scheduler-minikube            1/1     Running   0          67s   192.168.39.50    minikube       <none>           <none>
kube-system   storage-provisioner                1/1     Running   0          73s   192.168.39.50    minikube       <none>           <none>
$ minikube addons enable kubevirt
    ▪ Using image bitnami/kubectl:1.17
🌟  The 'kubevirt' addon is enabled
$ kubectl -n kube-system get po kubevirt-install-manager -o wide
NAME                       READY   STATUS    RESTARTS   AGE     IP           NODE           NOMINATED NODE   READINESS GATES
kubevirt-install-manager   1/1     Running   0          5m10s   10.244.1.2   minikube-m02   <none>           <none>
$ kubectl -n kube-system logs kubevirt-install-manager

error: the path "/manifests/kubevirt-base.yaml" does not exist
error: the path "/manifests/kubevirt.yaml" does not exist
$ kubectl -n kube-system exec -ti kubevirt-install-manager bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
I have no name!@kubevirt-install-manager:/$ curl http://github.com/
curl: (6) Could not resolve host: github.com

@sharifelgamal sharifelgamal added kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed kind/support Categorizes issue or PR as a support question. labels Sep 15, 2021
@sharifelgamal
Copy link
Collaborator

Well, this seems to continue being an error we're facing so I'll add it as a bug and investigate when I have time. Help is of course welcome as well.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 14, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 13, 2022
@sharifelgamal sharifelgamal added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Feb 9, 2022
@sharifelgamal
Copy link
Collaborator

I'm not sure if this remains an issue, but I want to keep it open to investigate.

@cwilkers
Copy link
Author

cwilkers commented Feb 9, 2022

I haven't verified it lately, but it is likely still an issue. I would like to attempt a fix, but haven't had time to start from scratch on the minikube build system.

@ShiroDN
Copy link

ShiroDN commented Mar 25, 2022

Yes, looks like it's still an issue.

$ minikube version
minikube version: v1.25.2
commit: 362d5fdc0a3dbee389b3d3f1034e8023e72bd3a7-dirty
$ minikube start --kubernetes-version=v1.23.4 --nodes=3 --cni=flannel
$ kubectl get po -A -o wide
NAMESPACE     NAME                               READY   STATUS    RESTARTS      AGE   IP               NODE           NOMINATED NODE   READINESS GATES
kube-system   coredns-64897985d-mkch2            1/1     Running   0             74s   10.88.0.2        minikube       <none>           <none>
kube-system   etcd-minikube                      1/1     Running   0             82s   192.168.39.41    minikube       <none>           <none>
kube-system   kube-apiserver-minikube            1/1     Running   0             88s   192.168.39.41    minikube       <none>           <none>
kube-system   kube-controller-manager-minikube   1/1     Running   0             82s   192.168.39.41    minikube       <none>           <none>
kube-system   kube-flannel-ds-amd64-5gmcp        1/1     Running   0             74s   192.168.39.41    minikube       <none>           <none>
kube-system   kube-flannel-ds-amd64-cp6ql        1/1     Running   0             13s   192.168.39.66    minikube-m03   <none>           <none>
kube-system   kube-flannel-ds-amd64-nn5dl        1/1     Running   0             49s   192.168.39.177   minikube-m02   <none>           <none>
kube-system   kube-proxy-7wlfv                   1/1     Running   0             13s   192.168.39.66    minikube-m03   <none>           <none>
kube-system   kube-proxy-8ncpp                   1/1     Running   0             49s   192.168.39.177   minikube-m02   <none>           <none>
kube-system   kube-proxy-g45dm                   1/1     Running   0             74s   192.168.39.41    minikube       <none>           <none>
kube-system   kube-scheduler-minikube            1/1     Running   0             81s   192.168.39.41    minikube       <none>           <none>
kube-system   storage-provisioner                1/1     Running   1 (72s ago)   85s   192.168.39.41    minikube       <none>           <none>
$ kubectl run nginx --image=nginx
pod/nginx created
$ kubectl exec -ti nginx -- bash
root@nginx:/# curl google.com
curl: (6) Could not resolve host: google.com

$ kubectl delete pod -n kube-system coredns-64897985d-mkch2
pod "coredns-64897985d-mkch2" deleted
[tomas@yggdrasil:~]$ kubectl get po -A -o wide
NAMESPACE     NAME                               READY   STATUS    RESTARTS      AGE     IP               NODE           NOMINATED NODE   READINESS GATES
default       nginx                              1/1     Running   0             8m47s   10.244.1.2       minikube-m02   <none>           <none>
kube-system   coredns-64897985d-p9h7j            1/1     Running   0             89s     10.244.1.3       minikube-m02   <none>           <none>
kube-system   etcd-minikube                      1/1     Running   0             11m     192.168.39.41    minikube       <none>           <none>
kube-system   kube-apiserver-minikube            1/1     Running   0             11m     192.168.39.41    minikube       <none>           <none>
.
.
.
$ kubectl exec -ti nginx -- bash
root@nginx:/# curl google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>

@spowelljr spowelljr added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Aug 3, 2022
@cwilkers
Copy link
Author

cwilkers commented Aug 8, 2022

I just verified the behavior on 1.26.1, and am looking into the cause again to see if this is something I could help fix.

@afbjorklund
Copy link
Collaborator

afbjorklund commented Aug 8, 2022

We need another workaround, since moving the CNI config got complicated with Kubernetes 1.24

If it is related to /etc/cni/net.d not being empty, that is ? Suggestion is to move podman and cri-o out...

@afbjorklund
Copy link
Collaborator

Basically Kubernetes doesn't work, unless you remove all other software from the (shared) configuration.

That is especially true for flannel, which normally doesn't install on host - but bootstraps from containers

@cwilkers
Copy link
Author

cwilkers commented Aug 8, 2022

If we cannot reorder it, would it be acceptable to conditionally delete the coredns container as the last part of applying the fabric?

@cwilkers
Copy link
Author

@afbjorklund I might need a little help with this; I'm able to come up with Go code to check for the status.podIP in the coredns pod, but I haven't figured out the right place in the start code to insert this. Each place I try, the coredns pod either does not yet exist, or does not have an IP yet.

@cwilkers
Copy link
Author

Here's an alternate idea, but it would require changes to coredns:

We could propose to add code in coredns's startup that identifies whether it gets an IP from the pod CIDR, and exit after a timeout if its IP doesn't match. To avoid breaking Kubernetes writ large, we could put this functionality behind a feature gate or environment variable that minikube could make use of.

@pavgup
Copy link

pavgup commented Dec 18, 2022

Just confirming this still appears to be broken and noting for folks that run into this that the suggested workaround in the kubevirt docs to delete coredns and disable/enable the kubevirt addon doesn't seem to be working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cni CNI support area/dns DNS issues co/multinode Issues related to multinode clusters kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

8 participants