Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOMKilled pod on master node #823

Closed
tsmetana opened this issue Dec 7, 2018 · 6 comments
Closed

OOMKilled pod on master node #823

tsmetana opened this issue Dec 7, 2018 · 6 comments

Comments

@tsmetana
Copy link
Member

tsmetana commented Dec 7, 2018

Version

$ openshift-install version
bin/openshift-install v0.5.0-master-36-gb4f5ceb6bfde8d3dc0e29f708e0494488ea37ee0
Terraform v0.11.8
$ ~/.terraform.d/plugins/terraform-provider-libvirt -version  # only needed if you're using libvirt
Compiled against library: libvirt 4.1.0
Using library: libvirt 4.1.0
Running hypervisor: QEMU 2.11.2
Running against daemon: 4.1.0

Platform (aws|libvirt|openstack):

libvirt

What happened?

After successful installation I tried to look around the cluster

[tsmetana@openlmi iscsi]$ oc get pods --all-namespaces
NAMESPACE                                                 NAME                                                              READY     STATUS      RESTARTS   AGE
...
openshift-kube-apiserver                                  installer-2-test1-master-0                                        0/1       OOMKilled   0          4h
...

What you expected to happen?

No OOMKilled pod.

How to reproduce it (as minimally and precisely as possible)?

I followed the Readme on how to install on libvirt.

Anything else we need to know?

The hypervisor machine doesn't look that memory-stressed:

[tsmetana@openlmi installer]$ free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        6.5G        167M        1.0M        8.9G        8.8G
Swap:           15G         16M         15G

Neither does the master node itself:

[core@test1-master-0 ~]$ free -h
              total        used        free      shared  buff/cache   available
Mem:           3.9G        3.1G        139M         14M        647M        411M
Swap:            0B          0B          0B

I wonder if the installation itself can't get too memory-hungry at some point: it seems like this had no other visible consequence but surely looks strange.

@cgwalters
Copy link
Member

Related: #785

@wking
Copy link
Member

wking commented Dec 10, 2018

Related: #785

Looks like he was already running with 4GiB on the masters, but yeah, there's some possible-consumer discussion there. My impression is that on most runs, memory usage is reasonably stable under 4GiB for the masters, but that sometimes something happens to cause a memory spike and things get OOMed. E.g. see this CI run. Hasn't happened when I've been watching to see though.

@wking
Copy link
Member

wking commented Dec 14, 2018

I haven't seen this in a while now, and it may have been fixed by some openshift/origin changes or similar. Can you still reproduce? If not, I think we should close this and we can re-open if someone hits it again.

@crawford
Copy link
Contributor

crawford commented Jan 4, 2019

Closing due to inactivity.

@openshift-ci-robot
Copy link
Contributor

@crawford: Closing this issue.

In response to this:

Closing due to inactivity.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tsmetana
Copy link
Member Author

tsmetana commented Jan 23, 2019

Installed yesterday (using d127242) and there were two OutOfMemory pods in the list. Moreover leaving the cluster running overnight ends up like this:

NAMESPACE                                    NAME                                                      READY     STATUS              RESTARTS   AGE
kube-system                                  etcd-member-test-1-master-0                               1/1       Running             0          23h
openshift-apiserver-operator                 openshift-apiserver-operator-95589d7cb-75pfl              1/1       Running             2          23h
openshift-apiserver                          apiserver-2f2cg                                           1/1       Running             1          23h
openshift-cloud-credential-operator          cloud-credential-operator-79dfb95b8c-wdmwl                1/1       Running             0          23h
openshift-cluster-api                        cluster-autoscaler-operator-ff5944ff4-82rgn               1/1       Running             2          23h
openshift-cluster-api                        clusterapi-manager-controllers-7669887789-ds8x8           4/4       Running             0          23h
openshift-cluster-api                        machine-api-operator-846bd8bf4-hzwj9                      1/1       Running             0          23h
openshift-cluster-machine-approver           machine-approver-569d5bfc96-mkzlq                         1/1       Running             0          23h
openshift-cluster-network-operator           cluster-network-operator-z5sl5                            1/1       Running             0          23h
openshift-cluster-node-tuning-operator       cluster-node-tuning-operator-6dfbb96dc4-ztq5d             1/1       Running             4          23h
openshift-cluster-node-tuning-operator       tuned-hlkls                                               1/1       NodeLost            1          23h
openshift-cluster-node-tuning-operator       tuned-mt42k                                               1/1       Running             0          23h
openshift-cluster-samples-operator           cluster-samples-operator-54b898cf74-hfvjz                 1/1       Running             0          23h
openshift-cluster-storage-operator           cluster-storage-operator-6b6c9477c-h5g9p                  1/1       Running             0          23h
openshift-cluster-version                    cluster-version-operator-5668496d9d-bkwr8                 1/1       Running             0          23h
openshift-console                            console-operator-85bc646-ltkhr                            0/1       Pending             0          23h
openshift-console                            console-operator-85bc646-rbd7p                            1/1       Running             2          23h
openshift-console                            openshift-console-f9c5f6684-pb87t                         0/1       Pending             0          23h
openshift-console                            openshift-console-f9c5f6684-t8m4z                         0/1       Pending             0          23h
openshift-controller-manager-operator        openshift-controller-manager-operator-5d7974bc56-pts8s    1/1       Running             2          23h
openshift-controller-manager                 controller-manager-kfbmg                                  1/1       Running             81         23h
openshift-core-operators                     openshift-service-cert-signer-operator-66447dccb5-rkxsz   1/1       Running             2          23h
openshift-core-operators                     origin-cluster-osin-operator-6d8cd758b-g5n2f              1/1       Running             2          23h
openshift-core-operators                     origin-cluster-osin-operator2-7fbc4bb7cc-wlrs8            1/1       Running             1          23h
openshift-dns-operator                       dns-operator-6dc745cbb6-qsqtf                             1/1       Running             0          23h
openshift-dns                                dns-default-958w6                                         2/2       Running             0          23h
openshift-dns                                dns-default-jgsbf                                         2/2       NodeLost            2          23h
openshift-image-registry                     cluster-image-registry-operator-6cc9884789-n29ng          1/1       Running             0          23h
openshift-image-registry                     image-registry-656d556655-kn78z                           0/1       Unknown             0          23h
openshift-image-registry                     image-registry-749b845885-kxh9q                           1/1       Running             0          23h
openshift-image-registry                     node-ca-4fhwx                                             0/1       ImageInspectError   0          23h
openshift-image-registry                     node-ca-cbq7x                                             1/1       Running             0          23h
openshift-ingress-operator                   ingress-operator-59dbf5b586-7zw5b                         1/1       Running             0          23h
openshift-ingress                            router-default-8587b5f587-4hnk5                           0/1       Pending             0          8h
openshift-ingress                            router-default-8587b5f587-nl4rg                           1/1       Unknown             1          23h
openshift-kube-apiserver-operator            openshift-kube-apiserver-operator-687bbddf48-ndlfd        1/1       Running             3          23h
openshift-kube-apiserver                     installer-1-test-1-master-0                               0/1       Completed           0          23h
openshift-kube-apiserver                     installer-10-test-1-master-0                              0/1       OutOfmemory         0          7h
openshift-kube-apiserver                     installer-11-test-1-master-0                              0/1       OutOfmemory         0          3h
openshift-kube-apiserver                     installer-2-test-1-master-0                               0/1       Completed           0          23h
openshift-kube-apiserver                     installer-3-test-1-master-0                               0/1       OutOfmemory         0          21h
openshift-kube-apiserver                     installer-4-test-1-master-0                               0/1       OutOfmemory         0          19h
openshift-kube-apiserver                     installer-5-test-1-master-0                               0/1       OutOfmemory         0          17h
openshift-kube-apiserver                     installer-6-test-1-master-0                               0/1       OutOfmemory         0          15h
openshift-kube-apiserver                     installer-7-test-1-master-0                               0/1       OutOfmemory         0          13h
openshift-kube-apiserver                     installer-8-test-1-master-0                               0/1       OutOfmemory         0          11h
openshift-kube-apiserver                     installer-9-test-1-master-0                               0/1       OutOfmemory         0          9h
openshift-kube-apiserver                     openshift-kube-apiserver-test-1-master-0                  1/1       Running             1          23h
openshift-kube-controller-manager-operator   kube-controller-manager-operator-74db5fb946-jx9tc         1/1       Running             1          23h
openshift-kube-controller-manager            installer-1-test-1-master-0                               0/1       Completed           0          23h
openshift-kube-controller-manager            installer-2-test-1-master-0                               0/1       Completed           0          23h
openshift-kube-controller-manager            installer-3-test-1-master-0                               0/1       OutOfmemory         0          9h
openshift-kube-controller-manager            openshift-kube-controller-manager-test-1-master-0         1/1       Running             2          23h
openshift-kube-scheduler-operator            openshift-kube-scheduler-operator-7fdfdcfcbd-z64cf        1/1       Running             2          23h
openshift-kube-scheduler                     installer-1-test-1-master-0                               0/1       Completed           0          23h
openshift-kube-scheduler                     openshift-kube-scheduler-test-1-master-0                  1/1       Running             1          23h
openshift-machine-config-operator            machine-config-controller-7b54bd4f45-tgkgv                1/1       Running             0          23h
openshift-machine-config-operator            machine-config-daemon-79655                               1/1       NodeLost            2          23h
openshift-machine-config-operator            machine-config-daemon-p77rk                               1/1       Running             1          23h
openshift-machine-config-operator            machine-config-operator-7f8467988c-cs2rv                  1/1       Running             0          23h
openshift-machine-config-operator            machine-config-server-zb98t                               1/1       Running             0          23h
openshift-marketplace                        certified-operators-kbffk                                 1/1       Unknown             1          23h
openshift-marketplace                        community-operators-l7hff                                 1/1       Unknown             1          23h
openshift-marketplace                        marketplace-operator-6d54674fdf-2w2jk                     1/1       Running             0          23h
openshift-marketplace                        redhat-operators-4sr6x                                    1/1       Unknown             2          23h
openshift-monitoring                         alertmanager-main-2                                       0/3       Pending             0          22h
openshift-monitoring                         cluster-monitoring-operator-f8fbb7c85-mrlwk               1/1       Running             0          23h
openshift-monitoring                         grafana-5cc84fbccf-76c8d                                  2/2       Unknown             2          23h
openshift-monitoring                         prometheus-k8s-0                                          6/6       Unknown             7          23h
openshift-monitoring                         prometheus-k8s-1                                          0/6       Pending             0          22h
openshift-monitoring                         prometheus-operator-5ddd9d8bfb-ph4fj                      1/1       Unknown             1          23h
openshift-operator-lifecycle-manager         catalog-operator-7c55fbc87b-2tfzq                         1/1       Running             0          23h
openshift-operator-lifecycle-manager         olm-operator-7579f9d647-j5rt2                             1/1       Running             1          23h
openshift-operator-lifecycle-manager         olm-operators-p9dmq                                       1/1       Unknown             1          23h
openshift-operator-lifecycle-manager         packageserver-b68f97f5d-xrdjc                             1/1       Running             16         19h
openshift-sdn                                ovs-n2m9f                                                 1/1       NodeLost            1          23h
openshift-sdn                                ovs-spk8v                                                 1/1       Running             0          23h
openshift-sdn                                sdn-56hg4                                                 1/1       NodeLost            1          23h
openshift-sdn                                sdn-controller-56vkw                                      1/1       Running             2          23h
openshift-sdn                                sdn-lnmc8                                                 1/1       Running             0          23h
openshift-service-cert-signer                apiservice-cabundle-injector-7fbcc7999b-9vkhx             1/1       Running             1          23h
openshift-service-cert-signer                configmap-cabundle-injector-d4c5f99f-z8xmk                1/1       Running             2          23h
openshift-service-cert-signer                service-serving-cert-signer-54fc8fdd6b-nm8qk              1/1       Running             1          23h
[tsmetana@openlmi ~]$ oc get nodes
NAME                    STATUS     ROLES     AGE       VERSION
test-1-master-0         Ready      master    23h       v1.11.0+f273347e6a
test-1-worker-0-5whk2   NotReady   worker    23h       v1.11.0+f273347e6a

I'm not sure what do all the installer-* pods do, but this looks very suspicious (note the cluster is not running any real workload and just falls apart on its own...).

All I did was bin/openshift-install create cluster --dir /home/tsmetana/ose-install.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants