Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watch gets dropped quite frequently for v1/services #352

Closed
Stono opened this issue Oct 1, 2018 · 9 comments
Closed

Watch gets dropped quite frequently for v1/services #352

Stono opened this issue Oct 1, 2018 · 9 comments
Labels

Comments

@Stono
Copy link

Stono commented Oct 1, 2018

Hey,
Noticing some really odd behaviour when watching services.

As you can see in this log, i'm watching three apis, nodes, ingresses and services. Services gets dropped really frequently and requires reconnection. When i say dropped, i mean the watcher.each do |notice| block returns and i have to reconnect the watch.

No errors or anything as far as I can see?

I, [2018-10-01T14:41:32.866122 #1]  INFO -- : [Queues::QueueFactory#queue_for] Using redis for queue
I, [2018-10-01T14:41:32.869182 #1]  INFO -- : [App#watch] Watching for 'nodes' on api '' version 'v1'
I, [2018-10-01T14:41:32.870100 #1]  INFO -- : [App#watch] /v1/nodes will start from offset: 41408258
I, [2018-10-01T14:41:32.870514 #1]  INFO -- : [App#watch] Watching for 'ingresses' on api '/apis/extensions' version 'v1beta1'
I, [2018-10-01T14:41:32.871624 #1]  INFO -- : [App#watch] /apis/extensions/v1beta1/ingresses will start from offset: 41407491
I, [2018-10-01T14:41:32.873113 #1]  INFO -- : [App#watch] Watching for 'services' on api '/apis/atcloud.io' version 'v1'
I, [2018-10-01T14:41:32.874617 #1]  INFO -- : [App#watch] /apis/atcloud.io/v1/services will start from offset: 41393730
W, [2018-10-01T14:43:10.771390 #1]  WARN -- : [App#watch] /apis/atcloud.io/v1/services stopped, will restart from 41393730
I, [2018-10-01T14:43:11.771714 #1]  INFO -- : [App#watch] /apis/atcloud.io/v1/services will start from offset: 41393730
I, [2018-10-01T14:44:04.085678 #1]  INFO -- : [Handlers::Slack#create_apis_atcloud_io_v1_services] shippr-simple
W, [2018-10-01T14:44:35.922408 #1]  WARN -- : [App#watch] /apis/atcloud.io/v1/services stopped, will restart from 41408969
I, [2018-10-01T14:44:36.922661 #1]  INFO -- : [App#watch] /apis/atcloud.io/v1/services will start from offset: 41408969
W, [2018-10-01T14:45:54.584440 #1]  WARN -- : [App#watch] /apis/atcloud.io/v1/services stopped, will restart from 41408969
I, [2018-10-01T14:45:55.584756 #1]  INFO -- : [App#watch] /apis/atcloud.io/v1/services will start from offset: 41408969
W, [2018-10-01T14:47:13.200599 #1]  WARN -- : [App#watch] /apis/atcloud.io/v1/services stopped, will restart from 41408969
I, [2018-10-01T14:47:14.200865 #1]  INFO -- : [App#watch] /apis/atcloud.io/v1/services will start from offset: 41408969
W, [2018-10-01T14:48:58.754999 #1]  WARN -- : [App#watch] /apis/atcloud.io/v1/services stopped, will restart from 41408969
I, [2018-10-01T14:48:59.755265 #1]  INFO -- : [App#watch] /apis/atcloud.io/v1/services will start from offset: 41408969
W, [2018-10-01T14:50:15.733208 #1]  WARN -- : [App#watch] /apis/atcloud.io/v1/services stopped, will restart from 41408969
I, [2018-10-01T14:50:16.733523 #1]  INFO -- : [App#watch] /apis/atcloud.io/v1/services will start from offset: 41408969

Any ideas?

@Stono
Copy link
Author

Stono commented Oct 9, 2018

Poke... We see this with some resources but not others. It's bizzare. Seems to be consistent with CRDs and v1/services

@Stono
Copy link
Author

Stono commented Oct 9, 2018

I've noticed on https://github.com/Stono/kubeclient/blob/master/lib/kubeclient/watch_stream.rb#L25 when this happens, the chunk is empty, so it's like we're receiving a nil body (perhaps some sort of keep alive?)

@cben
Copy link
Collaborator

cben commented Oct 9, 2018

sounds like duplicate of #273. (see there and #275 for some discussion how to re-connect from last seen resourceVersion. not yet implemented in kubeclient itself. your "stopped, will restart from offset NNNN" log suggests you already implemented that?)

The empty chunk finding is extremely interesting!
But looking at the code, empty chunk should just be a no-op, should not terminate the watch. So I'm afraid it's not a bug where we choke on "" but server really closes the socket (and the block may get one last empty chunk when that happens)?
From what I understood kubernetes disconnects watch sockets after some period of inactivity.

@cben cben added the bug label Oct 9, 2018
@Stono
Copy link
Author

Stono commented Oct 9, 2018

Hey, thanks for this.
That's right, I have already implemented a loop and restart from the last successful resourceVersion, it just feels like there should be some sort of keep alive that we are missing as it feels a bit rough!

@cben
Copy link
Collaborator

cben commented Oct 9, 2018

try adding logging for each received chunk, at least timestamp & the size of the chunk. it will let you confirm or refute:

  1. is there roughly constant time between last non-empty chunk to disconnection?
  2. is there exactly one empty chunk just before .each loop exits?

@Stono
Copy link
Author

Stono commented Oct 9, 2018

There is exactly one chunk of length 0 before the body.read.each exits

@cben
Copy link
Collaborator

cben commented Oct 9, 2018

cool, then I think we can close this in favor of #273?

if you feel like working on improving this, see my comments on #275.
perhaps a smaller step that'd already be useful is enriching watch objects with an attribute storing the last seen resource_version.

@Stono
Copy link
Author

Stono commented Oct 9, 2018

I'm not sure, it feels so odd.... check these logs which output all events:

You can see I set up watchers for:

  • /v1/nodes
  • /apis/extensions/v1beta1/ingresses
  • /apis/atcloud.io/v1/services
  • /apis/kubitzer.io/v1/secrets
  • /apis/kubitzer.io/v1/shared_secrets

You can see the CRDs are the ones that drop more frequently, it doesn't seem to be anything to do with a rate of activity (because /apis/extensions/v1beta1/ingresses also doesn't have any activity, but doesn't drop)

W, [2018-10-09T22:07:03.545247 #37311]  WARN -- : [App#block in setup_watch_thread] /apis/kubitzer.io/v1/shared_secrets has no offset stored, will start at the latest offset43258964 instead
W, [2018-10-09T22:07:03.545317 #37311]  WARN -- : [App#block in setup_watch_thread] Some events may have been missed!
I, [2018-10-09T22:07:03.545356 #37311]  INFO -- : [App#watch] /apis/kubitzer.io/v1/shared_secrets watch will start from offset: 43258964
W, [2018-10-09T22:07:03.547095 #37311]  WARN -- : [App#block in setup_watch_thread] /apis/kubitzer.io/v1/secrets has no offset stored, will start at the latest offset 43265612 instead
W, [2018-10-09T22:07:03.547139 #37311]  WARN -- : [App#block in setup_watch_thread] Some events may have been missed!
I, [2018-10-09T22:07:03.547164 #37311]  INFO -- : [App#watch] /apis/kubitzer.io/v1/secrets watch will start from offset: 43265612
W, [2018-10-09T22:07:03.642914 #37311]  WARN -- : [App#block in setup_watch_thread] /v1/nodes has no offset stored, will start at the latest offset 43331571 instead
W, [2018-10-09T22:07:03.642966 #37311]  WARN -- : [App#block in setup_watch_thread] Some events may have been missed!
I, [2018-10-09T22:07:03.642992 #37311]  INFO -- : [App#watch] /v1/nodes watch will start from offset: 43331571
W, [2018-10-09T22:07:03.812132 #37311]  WARN -- : [App#block in setup_watch_thread] /apis/atcloud.io/v1/services has no offset stored, will start at the latest offset 43312473 instead
W, [2018-10-09T22:07:03.812202 #37311]  WARN -- : [App#block in setup_watch_thread] Some events may have been missed!
I, [2018-10-09T22:07:03.812237 #37311]  INFO -- : [App#watch] /apis/atcloud.io/v1/services watch will start from offset: 43312473
W, [2018-10-09T22:07:03.899857 #37311]  WARN -- : [App#block in setup_watch_thread] /apis/extensions/v1beta1/ingresses has no offset stored, will start at the latest offset 43312598 instead
W, [2018-10-09T22:07:03.899924 #37311]  WARN -- : [App#block in setup_watch_thread] Some events may have been missed!
I, [2018-10-09T22:07:03.899952 #37311]  INFO -- : [App#watch] /apis/extensions/v1beta1/ingresses watch will start from offset: 43312598
D, [2018-10-09T22:07:06.474703 #37311] DEBUG -- : [App#add_to_queue] update (43331590) /v1/nodes/gke-delivery-platform-core-pool-a5aa4684-70fv
D, [2018-10-09T22:07:06.711010 #37311] DEBUG -- : [App#add_to_queue] update (43331591) /v1/nodes/gke-delivery-platform-cpu1ram2-330d7c37-cggp
D, [2018-10-09T22:07:08.301875 #37311] DEBUG -- : [App#add_to_queue] update (43331595) /v1/nodes/gke-delivery-platform-core-pool-a5aa4684-htrz
D, [2018-10-09T22:07:15.961709 #37311] DEBUG -- : [App#add_to_queue] update (43331614) /v1/nodes/gke-delivery-platform-cpu1ram2-330d7c37-cggp
D, [2018-10-09T22:07:16.672584 #37311] DEBUG -- : [App#add_to_queue] update (43331618) /v1/nodes/gke-delivery-platform-core-pool-a5aa4684-70fv
D, [2018-10-09T22:07:16.771613 #37311] DEBUG -- : [App#add_to_queue] update (43331619) /v1/nodes/gke-delivery-platform-cpu1ram2-330d7c37-cggp
D, [2018-10-09T22:07:17.128188 #37311] DEBUG -- : [App#add_to_queue] update (43331620) /v1/nodes/gke-delivery-platform-core-pool-a5aa4684-htrz
D, [2018-10-09T22:07:18.431425 #37311] DEBUG -- : [App#add_to_queue] update (43331623) /v1/nodes/gke-delivery-platform-core-pool-a5aa4684-htrz
D, [2018-10-09T22:07:26.823987 #37311] DEBUG -- : [App#add_to_queue] update (43331646) /v1/nodes/gke-delivery-platform-core-pool-a5aa4684-70fv
D, [2018-10-09T22:07:26.937033 #37311] DEBUG -- : [App#add_to_queue] update (43331647) /v1/nodes/gke-delivery-platform-cpu1ram2-330d7c37-cggp
D, [2018-10-09T22:07:28.408847 #37311] DEBUG -- : [App#add_to_queue] update (43331648) /v1/nodes/gke-delivery-platform-core-pool-a5aa4684-htrz
D, [2018-10-09T22:07:36.797184 #37311] DEBUG -- : [App#add_to_queue] update (43331672) /v1/nodes/gke-delivery-platform-core-pool-a5aa4684-70fv
D, [2018-10-09T22:07:36.894620 #37311] DEBUG -- : [App#add_to_queue] update (43331673) /v1/nodes/gke-delivery-platform-cpu1ram2-330d7c37-cggp
D, [2018-10-09T22:07:38.493136 #37311] DEBUG -- : [App#add_to_queue] update (43331674) /v1/nodes/gke-delivery-platform-core-pool-a5aa4684-htrz
D, [2018-10-09T22:07:46.865428 #37311] DEBUG -- : [App#add_to_queue] update (43331697) /v1/nodes/gke-delivery-platform-core-pool-a5aa4684-70fv
D, [2018-10-09T22:07:46.934801 #37311] DEBUG -- : [App#add_to_queue] update (43331698) /v1/nodes/gke-delivery-platform-cpu1ram2-330d7c37-cggp
D, [2018-10-09T22:07:48.473646 #37311] DEBUG -- : [App#add_to_queue] update (43331702) /v1/nodes/gke-delivery-platform-core-pool-a5aa4684-htrz
D, [2018-10-09T22:07:56.914548 #37311] DEBUG -- : [App#add_to_queue] update (43331724) /v1/nodes/gke-delivery-platform-core-pool-a5aa4684-70fv
D, [2018-10-09T22:07:56.963743 #37311] DEBUG -- : [App#add_to_queue] update (43331725) /v1/nodes/gke-delivery-platform-cpu1ram2-330d7c37-cggp
D, [2018-10-09T22:07:58.531128 #37311] DEBUG -- : [App#add_to_queue] update (43331726) /v1/nodes/gke-delivery-platform-core-pool-a5aa4684-htrz
D, [2018-10-09T22:08:00.568104 #37311] DEBUG -- : [App#add_to_queue] update (43331734) /v1/nodes/gke-delivery-platform-core-pool-a5aa4684-70fv
D, [2018-10-09T22:08:07.011952 #37311] DEBUG -- : [App#add_to_queue] update (43331750) /v1/nodes/gke-delivery-platform-core-pool-a5aa4684-70fv
D, [2018-10-09T22:08:07.112442 #37311] DEBUG -- : [App#add_to_queue] update (43331752) /v1/nodes/gke-delivery-platform-cpu1ram2-330d7c37-cggp
D, [2018-10-09T22:08:08.621882 #37311] DEBUG -- : [App#add_to_queue] update (43331753) /v1/nodes/gke-delivery-platform-core-pool-a5aa4684-htrz
W, [2018-10-09T22:08:09.039144 #37311]  WARN -- : [App#watch] /apis/kubitzer.io/v1/shared_secrets stopped, will restart from 43258964
I, [2018-10-09T22:08:10.042149 #37311]  INFO -- : [App#watch] /apis/kubitzer.io/v1/shared_secrets watch will start from offset: 43258964
D, [2018-10-09T22:08:16.945298 #37311] DEBUG -- : [App#add_to_queue] update (43331777) /v1/nodes/gke-delivery-platform-core-pool-a5aa4684-70fv
D, [2018-10-09T22:08:17.000401 #37311] DEBUG -- : [App#add_to_queue] update (43331778) /v1/nodes/gke-delivery-platform-cpu1ram2-330d7c37-cggp
D, [2018-10-09T22:08:17.105476 #37311] DEBUG -- : [App#add_to_queue] update (43331780) /v1/nodes/gke-delivery-platform-cpu1ram2-330d7c37-cggp
D, [2018-10-09T22:08:17.177432 #37311] DEBUG -- : [App#add_to_queue] update (43331781) /v1/nodes/gke-delivery-platform-core-pool-a5aa4684-htrz
D, [2018-10-09T22:08:18.712541 #37311] DEBUG -- : [App#add_to_queue] update (43331784) /v1/nodes/gke-delivery-platform-core-pool-a5aa4684-htrz
W, [2018-10-09T22:08:22.359254 #37311]  WARN -- : [App#watch] /apis/kubitzer.io/v1/secrets stopped, will restart from 43265612
I, [2018-10-09T22:08:23.360279 #37311]  INFO -- : [App#watch] /apis/kubitzer.io/v1/secrets watch will start from offset: 43265612
W, [2018-10-09T22:08:23.389326 #37311]  WARN -- : [App#watch] /apis/atcloud.io/v1/services stopped, will restart from 43312473
I, [2018-10-09T22:08:24.390742 #37311]  INFO -- : [App#watch] /apis/atcloud.io/v1/services watch will start from offset: 43312473
D, [2018-10-09T22:08:26.850834 #37311] DEBUG -- : [App#add_to_queue] update (43331806) /v1/nodes/gke-delivery-platform-core-pool-a5aa4684-70fv
D, [2018-10-09T22:08:26.952799 #37311] DEBUG -- : [App#add_to_queue] update (43331807) /v1/nodes/gke-delivery-platform-cpu1ram2-330d7c37-cggp
D, [2018-10-09T22:08:28.591471 #37311] DEBUG -- : [App#add_to_queue] update (43331809) /v1/nodes/gke-delivery-platform-core-pool-a5aa4684-htrz

@cben
Copy link
Collaborator

cben commented Oct 10, 2018

Yeah, somewhat strange.
Found this:

https://github.com/kubernetes/apiserver/blob/2603dd2bc0e2/pkg/server/config.go#L157-L159

// If specified, long running requests such as watch will be allocated a random timeout between this value, and
// twice this value. Note that it is up to the request handlers to ignore or honor this timeout. In seconds.
MinRequestTimeout int

controlled by --min-request-timeout command line flag, not sure what's the default.
Anyway, it's a thing and clients need to deal with it.

closing, let's continue on #273.

@cben cben closed this as completed Oct 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants