Consistently Seeing Reflector Watch Errors on Controller Shutdown #2723

jonathan-innis · 2024-03-22T21:52:04Z

During controller shutdown, we consistently see errors that look like

logger.go:146: 2024-03-22T21:35:31.707Z	INFO	cache/reflector.go:462	pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding

This occurs extremely consistently during shutdown and I wouldn't expect that we would see something that looks like an error that comes through an INFO/WARN path. From looking at the reflector code, it seems like this "error" is coming from this line. Is there a way to ensure that the runnable shutdown doesn't fire this error every time that we shutdown?

As an example, these "errors" are coming in our Karpenter E2E testing here: https://github.com/aws/karpenter-provider-aws/actions/runs/8396432349/job/22997824261

The text was updated successfully, but these errors were encountered:

troy0820 · 2024-03-22T22:05:08Z

/kind support

laihezhao · 2024-03-25T07:23:59Z

@jonathan-innis I meet the same problem,

will it causes memory leaks??

jonathan-innis · 2024-03-25T08:49:17Z

If the controller is shutting down, I don't think it's going to cause memory leaks. From looking through the code, it just looks like spurious error logging from the reflector as all the context cancels are happening, but I'm imagining there's a more graceful way to shut the reflector down so we don't see this.

jonathan-innis · 2024-03-25T08:50:21Z

@laihezhao Your error also looks quite different from mine. Yours appears to be caused by 500s occurring somewhere on the apiserver.

jonathan-innis · 2024-04-06T22:51:27Z

@troy0820 Got any thoughts here on how this can be improved? Ideally, we wouldn't be seeing errors for what appears to be a graceful shutdown for controller-runtime.

troy0820 · 2024-04-08T14:41:19Z

@jonathan-innis I am going to investigate this but this looks like it can be a bug, so I will label the issue with it so we can triage it a little better.

/kind bug

barchw · 2024-05-10T14:00:32Z

I have seen this issue in our controllers happen often, it happens when a Custom Resource Definition for a resource that was prior accessed by the controller gets deleted (no matter whether the CR is controlled by the controller, or just accessed). This seems like an unnecessary behaviour; watching a resource that is already gone from the cluster seems not to be necessary, and definitely should not be creating logs that cannot be otherwise ignored.
Generally for resources that we did not access often we implemented a workaround of using unstructured to not set up unnecessary watch on them, but it would be nice to see that this issue is somehow addressed.

What could help is an option to disable the reflector watch.

sbueringer · 2024-05-10T15:53:46Z

Generally for resources that we did not access often we implemented a workaround of using unstructured to not set up unnecessary watch on them, but it would be nice to see that this issue is somehow addressed.

I think instead of this you can just do the following:

		Client: client.Options{
			Cache: &client.CacheOptions{
				DisableFor: []client.Object{
					&corev1.ConfigMap{},
					&corev1.Secret{},
				},
			},
		},

The reason why using Unstructured helps is because Unstructred is not cached per default

// CacheOptions are options for creating a cache-backed client.
type CacheOptions struct {
	// Reader is a cache-backed reader that will be used to read objects from the cache.
	// +required
	Reader Reader
	// DisableFor is a list of objects that should never be read from the cache.
	// Objects configured here always result in a live lookup.
	DisableFor []Object
	// Unstructured is a flag that indicates whether the cache-backed client should
	// read unstructured objects or lists from the cache.
	// If false, unstructured objects will always result in a live lookup.
	Unstructured bool
}

k8s-triage-robot · 2024-08-08T16:22:47Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-09-07T17:14:29Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

jonathan-innis mentioned this issue Mar 22, 2024

test: ConfigureGlobalLoggers for klog in testing aws/karpenter-provider-aws#5916

Closed

3 tasks

k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Mar 22, 2024

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 8, 2024

barchw mentioned this issue May 10, 2024

Error watching Ory Rules kyma-project/api-gateway#1047

Closed

pmalek mentioned this issue Jul 9, 2024

Ingress controller fails to start up after updating from 2.30.0 to 2.39.3 Kong/charts#1102

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 8, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistently Seeing Reflector Watch Errors on Controller Shutdown #2723

Consistently Seeing Reflector Watch Errors on Controller Shutdown #2723

jonathan-innis commented Mar 22, 2024 •

edited

Loading

troy0820 commented Mar 22, 2024

laihezhao commented Mar 25, 2024

jonathan-innis commented Mar 25, 2024

jonathan-innis commented Mar 25, 2024

jonathan-innis commented Apr 6, 2024

troy0820 commented Apr 8, 2024

barchw commented May 10, 2024 •

edited

Loading

sbueringer commented May 10, 2024

k8s-triage-robot commented Aug 8, 2024

k8s-triage-robot commented Sep 7, 2024

Consistently Seeing Reflector Watch Errors on Controller Shutdown #2723

Consistently Seeing Reflector Watch Errors on Controller Shutdown #2723

Comments

jonathan-innis commented Mar 22, 2024 • edited Loading

troy0820 commented Mar 22, 2024

laihezhao commented Mar 25, 2024

jonathan-innis commented Mar 25, 2024

jonathan-innis commented Mar 25, 2024

jonathan-innis commented Apr 6, 2024

troy0820 commented Apr 8, 2024

barchw commented May 10, 2024 • edited Loading

sbueringer commented May 10, 2024

k8s-triage-robot commented Aug 8, 2024

k8s-triage-robot commented Sep 7, 2024

jonathan-innis commented Mar 22, 2024 •

edited

Loading

barchw commented May 10, 2024 •

edited

Loading