Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data/aws/vpc/master-elb: Set thresholds back to two #1629

Merged

Conversation

wking
Copy link
Member

@wking wking commented Apr 16, 2019

These were changes from two to three in 16dfbb3 (#594):

$ git show 16dfbb354120 | grep _threshold | sort | uniq
-    healthy_threshold   = 2
-  #   healthy_threshold   = 2
+    healthy_threshold   = 3
-    unhealthy_threshold = 2
-  #   unhealthy_threshold = 2
+    unhealthy_threshold = 3

@crawford doesn't remember intentionally making that change, and the lower thresholds will hopefully help mitigate some issues with continued connection attempts to API servers which are in the process of shutting down.

These were changes from two to three in 16dfbb3 (data/aws: use nlbs
instead of elbs, 2019-11-01, openshift#594):

  $ git show 16dfbb3 | grep _threshold | sort | uniq
  -    healthy_threshold   = 2
  -  #   healthy_threshold   = 2
  +    healthy_threshold   = 3
  -    unhealthy_threshold = 2
  -  #   unhealthy_threshold = 2
  +    unhealthy_threshold = 3

Alex doesn't remember intentionally making that change, and the lower
thresholds will hopefully help mitigate some issues with continued
connection attempts to API servers which are in the process of
shutting down.
@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 16, 2019
@wking
Copy link
Member Author

wking commented Apr 16, 2019

/hold

We want to give this a few rounds in CI and see how it goes.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 16, 2019
@wking
Copy link
Member Author

wking commented Apr 16, 2019

Also in flight in this space: openshift/machine-config-operator#638

@wking
Copy link
Member Author

wking commented Apr 16, 2019

e2e-aws:

Get https://api.ci-op-37kfz7j8-1d3f3.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/nodes?fieldSelector=spec.unschedulable%3Dfalse&resourceVersion=0: net/http: TLS handshake timeout
...
    Get https://api.ci-op-37kfz7j8-1d3f3.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/nodes?fieldSelector=spec.unschedulable%3Dfalse&resourceVersion=0: dial tcp 3.212.56.195:6443: connect: connection refused
...
Failing tests:

[sig-storage] In-tree Volumes [Driver: aws] [Testpattern: Pre-provisioned PV (default fs)] subPath should support existing directories when readOnly specified in the volumeSource [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: hostPathSymlink] [Testpattern: Inline-volume (ext3)] volumes should allow exec of files on the volume [Suite:openshift/conformance/parallel] [Suite:k8s]

Hrm. I guess I'll kick this again just to start assembing some more statistics...

/retest

@abhinavdahiya
Copy link
Contributor

/test e2e-metal

@smarterclayton
Copy link
Contributor

I ran a number of local stress tests with 2 instead 3. I was seeing a better outcome with those (less failures) during with rolling upgrade.

@smarterclayton
Copy link
Contributor

Do we need to update any other ones (int / ext /??)

/test e2e-aws

@wking
Copy link
Member Author

wking commented Apr 18, 2019

Do we need to update any other ones (int / ext /??)

This is already getting the internal and external Kubernetes API balancers and the internal Ignition-config (machine-config server) load balancer.

@abhinavdahiya
Copy link
Contributor

e2e-aws failed to build upi-installer image...

could not wait for build: the build upi-installer failed after 6m42s with reason DockerBuildFailed: Docker build strategy has failed.

+ echo ' release'
+ grep -q libvirt
+ go build -ldflags ' -X github.com/openshift/installer/pk...' release' -o bin/openshift-install ./cmd/openshift-install
--> RUN GOBIN=$(pwd)/bin go get -u github.com/coreos/terraform-provider-matchbox
error: build error: no such image

maybe retry
/retest

@smarterclayton
Copy link
Contributor

/lgtm

Will test over weekend in the CI and see the outcome in combo with the change to the wait period for shutdown

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 18, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: smarterclayton, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [smarterclayton,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@smarterclayton
Copy link
Contributor

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 18, 2019
@wking
Copy link
Member Author

wking commented Apr 19, 2019

e2e-aws:

Flaky tests:

[sig-storage] PersistentVolumes NFS with Single PV - PVC pairs create a PVC and non-pre-bound PV: test write access [Suite:openshift/conformance/parallel] [Suite:k8s]

Failing tests:

[Conformance][Area:Networking][Feature:Router] The HAProxy router should enable openshift-monitoring to pull metrics [Suite:openshift/conformance/parallel/minimal]
[Conformance][Area:Networking][Feature:Router] The HAProxy router should expose a health check on the metrics port [Suite:openshift/conformance/parallel/minimal]
[Conformance][Area:Networking][Feature:Router] The HAProxy router should expose the profiling endpoints [Suite:openshift/conformance/parallel/minimal]
[Conformance][Area:Networking][Feature:Router] The HAProxy router should respond with 503 to unrecognized hosts [Suite:openshift/conformance/parallel/minimal]
[Conformance][Area:Networking][Feature:Router] The HAProxy router should serve routes that were created from an ingress [Suite:openshift/conformance/parallel/minimal]
[Conformance][Area:Networking][Feature:Router] The HAProxy router should set Forwarded headers appropriately [Suite:openshift/conformance/parallel/minimal]
[Conformance][Area:Networking][Feature:Router] The HAProxy router should support reencrypt to services backed by a serving certificate automatically [Suite:openshift/conformance/parallel/minimal]
[sig-apps] StatefulSet [k8s.io] Basic StatefulSet functionality [StatefulSetBasic] should not deadlock when a pod's predecessor fails [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-apps] StatefulSet [k8s.io] Basic StatefulSet functionality [StatefulSetBasic] should perform rolling updates and roll backs of template modifications with PVCs [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-apps] StatefulSet [k8s.io] Basic StatefulSet functionality [StatefulSetBasic] should provide basic identity [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] PVC Protection Verify "immediate" deletion of a PVC that is not in active use by a pod [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] PVC Protection Verify that PVC in active use by a pod is not removed immediately [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] PVC Protection Verify that scheduling of a pod that uses PVC that is being deleted fails and the pod becomes Unschedulable [Suite:openshift/conformance/parallel] [Suite:k8s]

/retest

@openshift-ci-robot
Copy link
Contributor

@wking: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-metal da1d821 link /test e2e-metal

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-merge-robot added a commit that referenced this pull request Apr 19, 2019
data/aws/vpc/master-elb: Set thresholds back to two
@openshift-merge-robot openshift-merge-robot merged commit cea35e7 into openshift:master Apr 19, 2019
@openshift-merge-robot openshift-merge-robot merged commit da1d821 into openshift:master Apr 19, 2019
@wking wking deleted the nlb-threshold-back-to-two branch April 19, 2019 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants