Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CrashLoopBackOff pods after 'apply k8s' #2077

Closed
dpaks opened this issue Dec 11, 2018 · 4 comments
Closed

CrashLoopBackOff pods after 'apply k8s' #2077

dpaks opened this issue Dec 11, 2018 · 4 comments

Comments

@dpaks
Copy link

dpaks commented Dec 11, 2018

Client Version: v1.13.0, GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: v1.13.0, GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
OS: Ubuntu 18
ksonnet version: 0.13.1
jsonnet version: v0.11.2
client-go version: kubernetes-1.10.4
argo: v2.2.1
kubeflow: v0.3.4
Env: A single node cluster using kubeadm

After issuing ${KUBEFLOW_SRC}/scripts/kfctl.sh apply k8s, I happen to see that few of the pods are in CrashLoopBackOff state.

kubeflow ambassador-9f48fcc6c-lfcxg 2/3 CrashLoopBackOff 39 3h21m
kubeflow ambassador-9f48fcc6c-lz7xn 2/3 CrashLoopBackOff 39 3h21m
kubeflow ambassador-9f48fcc6c-xd27x 2/3 CrashLoopBackOff 39 3h21m
kubeflow ml-pipeline-65dbcdc844-jmtjx 0/1 CrashLoopBackOff 30 3h20m
kubeflow ml-pipeline-persistenceagent-69bd5876df-nz9mg 0/1 CrashLoopBackOff 29 3h20m
kubeflow vizier-core-7ccdc5577-w92wk 0/1 CrashLoopBackOff 30 3h19m

There are no logs for them. When I described ambassador pod, I got the following.
Events:
Type Reason Age From Message


Warning BackOff 116s (x833 over 3h17m) kubelet, ubuntu-3 Back-off restarting failed container

Many solutions that I found online suggested adding a command to the docker yaml. Should I do as suggested or shall I ignore this issue?

@royxue
Copy link

royxue commented Dec 12, 2018

@dpaks +1 the same problem
ambassador is fine for me,
but other 3 is keep cashloopbackoff

@dpaks
Copy link
Author

dpaks commented Dec 12, 2018

@dpaks +1 the same problem
ambassador is fine for me,
but other 3 is keep cashloopbackoff

Can you give a +1 for the issue, so that this question catch dev attention?

@royxue
Copy link

royxue commented Dec 20, 2018

@dpaks
Finally I have some time to look into this issue, I think this come with different causes.
For vizier-core, the problem is probably that your storage class, PV, PVC issues, which make the vizier db is on pending.
For ml-pipeline, hmmm I think the problem might be this

W1220 10:06:12.815524       1 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
W1220 10:06:12.816175       1 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.

@dpaks
Copy link
Author

dpaks commented Dec 20, 2018

@royxue Yes, you're right. This doesn't have anything to do with Kubeflow. In my case, the pods were not able to reach outside the subnet. After resolving that, things worked fine.

@dpaks dpaks closed this as completed Dec 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants