Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Currently pods can be scheduled to master nodes #50

Closed
swade1987 opened this issue Jan 17, 2018 · 13 comments
Closed

Currently pods can be scheduled to master nodes #50

swade1987 opened this issue Jan 17, 2018 · 13 comments

Comments

@swade1987
Copy link
Collaborator

If the master node(s) have a NoSchedule taint the current deployment manifest would allow pods to be deployed on master nodes.

Assumptions:

  • You have a Kubernetes cluster running

  • The master node(s) kubelet config has the register-schedulable=true set.

  • The master nodes(s) have the following taint applied

    kubectl taint nodes master1 node-role.kubernetes.io/master="":NoSchedule
    
  • Cordon all other nodes (excluding master nodes) in the cluster

    kubectl cordon <node name>
    

Result:

coredns-77b5855fb7-f6ng7                 1/1       Running   0          7s        172.16.137.65    master1
coredns-77b5855fb7-pxfjh                 1/1       Running   0          7s        172.16.137.65    master1

Solution:

Edit the deployment and remove the following:

- key: node-role.kubernetes.io/master
  effect: NoSchedule

Delete all current coredns pods using:

kubectl delete pods -n kube-system -l k8s-app=coredns

When describing one of the pending pods, the description displays:

Warning  FailedScheduling  4s (x6 over 19s)  default-scheduler  0/8 nodes are available: 1 PodToleratesNodeTaints, 7 NodeUnschedulable

If you uncordon the worker nodes, the pods start to be scheduled correctly.

@chrisohaver
Copy link
Member

The master noSchedule taint toleration was replicated from the kube-dns deployment manifest in kubeadm.
I don't know the original reason for it, but it may be due to the fact that when building a cluster, initially there is only a master and no nodes, and cluster DNS may be needed for some base operation before nodes are added.

@luxas, do you know the original reason for adding the master taint toleration to the cluster dns service in kubeadm?

@chrisohaver
Copy link
Member

@bowei, are you familiar with the reasoning behind adding the master taint toleration to kube-dns?

@chrisohaver
Copy link
Member

chrisohaver commented Jan 25, 2018

@swade1987, in lieu of input from kubeadm or kube-dns, what are the reasons you think coredns should not be able to run on master node?

@miekg
Copy link
Member

miekg commented Jan 25, 2018 via email

@chrisohaver
Copy link
Member

"almost certainly" is almost certainly an exageration... ;)

@chrisohaver
Copy link
Member

It seems there is some inconsistency on this position within kubernetes.

For example: kubernetes/kubernetes#54945 is a request to add the master taint toleration to kube-dns... which means it wasn't there before. Though I think the addon directory that the PR modifies is deprecated... so the kube-dns manifests could have been out of date...

@chrisohaver
Copy link
Member

Yeah - i confirmed the kubernetes/cluster/addons directory is "legacy" per its readme, and "deprecated" per the kubernetes addons web page.

Anyways, one argument to leave the toleration in place: In a scenario where coredns cannot for whatever reason be scheduled to a worker node, then running it on the master is better than not running it at all.

@swade1987
Copy link
Collaborator Author

I always go with the separation of concerns mindset. Leave master nodes to be master nodes and act as just the kubernetes control plane.

@miekg
Copy link
Member

miekg commented Jan 25, 2018 via email

@willvrny
Copy link

Would it be better to have the default be the preferred practice e.g. separation of concerns therefore no master toleration and then document that if people want to schedule to master nodes to add the toleration?

@johnbelamaric
Copy link
Member

Guess it's whether you consider service discovery a critical service that is part of the control plane or an add on extra thing. Most things won't work without it. So a PreferNoSchedule makes sense to me.

That said, these manifests are not gospel and can be tweaked for specific deployments.

@bowei
Copy link

bowei commented Jan 25, 2018

The kube-dns manifests in the addons directory are current. It looks like the toleration is only in kubeadm.

Regarding where kube-dns runs, in general, the master node typically does not run things such as kube-proxy or kube-dns. In some deployments, the master node is not actually part of the normal cluster network. This means pods running on the master node will not be network reachable, hence services such as kube-dns won't work with pods scheduled on the master.

Of course for clusters where the master node(s) are not special, this can be tweaked.

@chrisohaver
Copy link
Member

OK - I merged this. The kubeadm team can leave the taint toleration in if that's what they prefer.

One size isn't going to fit all. And I don't think we can even prescribe a "preferred" deployment manifest. There isn't a "typical" deployment. This deployment is just a suggestion. etc etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants