Disk usage metrics for containerd #2785

ribbybibby · 2021-01-11T09:03:50Z

When switching from docker to containerd as my container runtime in Kubernetes, I noticed that container_fs_usage_bytes metrics were no longer being exported for my containers.

It looks like disk usage metrics aren't implemented for containerd, as noted by this comment: https://github.com/google/cadvisor/blob/v0.38.6/container/containerd/handler.go#L164-L165.

Disk usage is a pretty important metric to monitor, so I think, if possible, this should be added.

The text was updated successfully, but these errors were encountered:

Remove container fs inodes: disk metrics are not supported in OCI it seems (google/cadvisor#2785), and the metrics it reports in docker-compose feels rather dubious at times. Instead, make ContainerIOUsage a shared observable, and the services that had relevant uses for the inodes monitoring now have this instead. Reworked container restart: use cAdvisor metrics to detect container restarts in all environments cAdvisor and monitoring documentation: inline documentation improvements and a new cAdvisor page in the docsite Shared Group titles: titles are now in `shared` package for consistency and ease of editing

elcomtik · 2021-02-01T16:20:27Z

I have experienced the same issue.

It looks like disk usage metrics aren't implemented for containerd, as noted by this comment: https://github.com/google/cadvisor/blob/v0.38.6/container/containerd/handler.go#L164-L165.

There is some conversation about these metrics containerd/containerd#678. I suppose that contained provide this information.

yyrdl · 2021-05-19T10:00:16Z

PR was submited . #2872

jepio · 2022-01-24T13:14:44Z

PR #2872 was closed, in favor of #2956, which was merged and subsequently reverted in #2964. The result is that these metrics are not available.

Is someone working on an alternative approach?

sarbajitc · 2022-04-22T05:02:04Z

Is there any timeline for a fix of this issue?

baasumo · 2022-05-02T13:52:50Z

+1 on looking for any update or timeline regarding this issue - these metrics are pretty important for observability and workload behavior.

fernandesnikhil · 2022-05-02T14:55:39Z

Adding onto this ticket since we're blocked on switching to the containerd CRI without these metrics. We have alerting around ephemeral file system usage that would break if cAdvisor doesn't collect these from containerd.

snuggie12 · 2022-05-07T10:35:53Z

@bobbypage Do you have an update on this? Best I can follow is that there is a possibly-working version in the containerd-cri branch after #2966 was merged. However, it might be incomplete based on #2936 (comment)?

Alternatively it seems like work has gone into not using cadvisor for container stats and k8s 1.23 has an alpha feature-gate which uses the cri stats provider (PodAndContainerStatsFromCRI). Is the plan to put momentum into that instead? If so, do you know when it would go beta?

brandond · 2022-05-25T20:17:42Z

Enabling the PodAndContainerStatsFromCRI feature-gate does not seem to work either; at least with containerd 1.6.4 the stats are still missing.

brandond · 2022-08-04T00:21:15Z

It appears that this won't be addressed any time soon, as KEP-2371 moves most of the stats collection out of cadvisor into the CRI interface. Is there an interim solution for users that need these stats?

bobbypage · 2022-08-04T02:19:33Z

The workaround for now is to use the containerd-cri branch (https://github.com/google/cadvisor/tree/containerd-cri) which has a special patch to export containerd disk metrics. The following image can be used: gcr.io/cadvisor/cadvisor:v0.45.0-containerd-cri which is built from that branch and contains the patch.

brandond · 2022-08-04T04:25:22Z

Is that branch being actively maintained? Do you know if it still works normally with other runtimes?

bobbypage · 2022-08-04T06:50:02Z

Is that branch being actively maintained?

Yes, it is maintained we just pushed the latest v0.45.0 changes to this branch. The reason we need this separate branch is because to get the Disk usage metrics on containerd requires importing the CRI API into cAdvisor. However, we can't import the CRI API into cAdvisor because cAdvisor is imported by k8s and k8s itself includes the CRI API which results in a circular dependency. So the workaround for now is to have this separate branch which includes CRI API. (see #2872 (comment) for that discussion).

Do you know if it still works normally with other runtimes?

Yes, it will work with other runtimes as well, but if containerd is not used there is no benefit of using it.

brandond · 2022-08-04T08:11:49Z

Ah hmm. So I take it the circular dependency prohibits this branch from being embedded in the kubelet, and there's no easy path towards doing so? Running a standalone deployment of cadvisor isn't particularly palatable, as asking our users to retool their monitoring stacks to make use of that would be a non-trivial amount of work. I'm honestly surprised that we got this far into the dockershim depreciation with cadvisor missing feature parity for one of the most popular replacement runtimes.

bobbypage · 2022-08-04T17:10:12Z

@brandond are you referring that most folks are using the existing /cadvisor/metrics endpoint on kubelet? If that's the case, then yes, unfortunately we aren't able to bring back this patch into kubelet due to circular dependency issue. The KEP https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2371-cri-pod-container-stats/README.md aims to solve this issue long term.

snuggie12 · 2022-08-08T04:45:54Z

Would it be possible to list a config somewhere that only gathers the disk? GKE controls our normal cadvisor so running a minimal "container disk metrics only" daemonset seems like a simple work around.

yvespp · 2022-10-10T13:07:12Z

kubectl get --raw "/api/v1/nodes/(node-name)/proxy/stats/summary" (from kublet) gives infos for ephemeral-storage for each pod but sadly it's not available as a Prometheus metric...

george-angel · 2023-02-20T10:49:04Z

Refreshing my memory on this issue, I realised we didn't link to the exporter @ribbybibby written to address this: https://github.com/utilitywarehouse/kube-summary-exporter. We have been running it for nearly 2 years now.

vasireddy99 · 2023-02-21T18:02:20Z

Hi @bobbypage/@team Is this comment still valid, so shall we expect the similar release tag for container 47.1 as well ? and for the rest of the releases untill this is fixed in enhancement/KEP. can you please confirm if you would recommend using this containerd-cri tags for the fix. seem like it is only workaround available.
Also could see the implementation. please correct me.

sethAmazon · 2023-02-21T19:29:38Z

Does containerd-cri work? I did a replace in my go mod and it still did not work. I see spec sets has file system to false. https://github.com/google/cadvisor/blob/containerd-cri/container/containerd/handler.go#L287-L289

markrity · 2023-05-04T16:40:09Z

Any updates on this ? containerd is the default and recommended runtime for GKE , but there is still no support for kubernetes_filesystem_usage ?

sidewinder12s · 2023-05-24T22:37:57Z

It appears at least on containerd://1.6.6 and the v0.45.0-containerd-cri tag, the `container_fs_* metrics are also just wrong.

container_fs_usage_bytes at least seems to be reporting the root device free space for every pod on the node as opposed to each containers/pods usage. Does anyone have a reference deployment manifest to use for containerd + that containerd-cri tag?

ref - google/cadvisor#2785

dragosrosculete · 2023-08-10T12:07:44Z

Any hope for this to be implemented soon ?

* Add dashboards * Introduce new value `IsGardenCluster` * Add dashboard providers configmap * Add datasource configMap * Add service * Add dashboard configMaps * Add deployment * Add ingress * Move helper function at the end * Deploy oidc dashboard only if authentication webhook is enabled * Integrate plutono in gop flow * Adapt seed plutono * Adapt shoot plutono * Integrate vali * Adapt test * Adapt integration and e2e test * --------------Empty separator commit--------------- * Reuse dashboard among shoot and garden * Change datasource name from `cluster-prometheus` to `prometheus` Update plutono.go * Adapt apiserver-overview dashboard to make it reusable. Rename dashboard variable "apiserver" to "pod" Add 2 variables: job and pod Add the pod variable to the promql expressions * Reuse `apiserver overview` dashboard * Reuse `apiserver` related other dashboard * make default selection all * Reuse apiserver-request-duration-and-response dashboard Old shoot dashboard had some random stuff also * Add pod logs to kubernetes pods dashbboard * Remove Pod file system usage metrics ref - google/cadvisor#2785 * Adapt PC doc * Address review * Use same port for all use case * Drop special handling for OIDC webhook * Allow garden dashboard to have additional dashboards * Adapt test * Use wildcard cert for ingress in runtime cluster * Address review * Address review * Update docs/usage/trusted-tls-for-garden-runtime.md * Update docs/README.md --------- Co-authored-by: Tim Usner <tim.usner@sap.com>

smileusd · 2023-11-28T03:38:50Z

Any updates?

mikkeloscar · 2023-11-28T08:09:30Z

I tried to rebase (https://github.com/google/cadvisor/tree/containerd-cri) on v0.48.0 (and v0.47.1) in both cases the resource usage blows up: 🙁

I do see values for the metrics, but didn't validate that they are correct as is reported not to be in other comments.

brandond · 2023-11-28T20:09:23Z

I doubt this is going to be fixed, given the work in progress to move stats into the CRI API, and use the CRI stats to replace the data currently served at the cadvisor metrics endpoint - as discussed above.

https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2371-cri-pod-container-stats/README.md#cadvisor-less-cri-full-container-and-pod-stats

enhance the CRI API with enough metrics to be able to supplement the pod and container fields in the summary API directly from CRI.

enhance the CRI implementations to broadcast the required metrics to fulfill the pod and container fields in the /metrics/cadvisor endpoint.

It looks like containerd's cgroupv2 manager does not currently support filesystem utilization stats; it only returns data for PIDs, CPU, memory, block IO, RDMA, and HugeTLB.
https://github.com/containerd/cgroups/blob/v3.0.2/cgroup2/stats/metrics.pb.go#L28-L34

tripitakav · 2024-01-25T09:03:55Z

Is Have Any updates?

ning1875 · 2024-02-21T06:23:00Z

use the crictl tool can get container fs usage ,eg

 crictl stats
CONTAINER           CPU %               MEM                 DISK                INODES
0674440a33dbd       0.00                1.438MB             102.4kB             24
2e2f101e7ce72       0.06                62.43MB             114.7kB             29
37ed67b1e33cf       1.58                346.1MB             110.6MB             41

and the data show in DISK row come from this code
crictl called cadvisor ListContainerStats api with grpc
and the response has a key named WritableLayer mean container fs usage

type ContainerStats struct {
	// Information of the container.
	Attributes *ContainerAttributes `protobuf:"bytes,1,opt,name=attributes,proto3" json:"attributes,omitempty"`
	// CPU usage gathered from the container.
	Cpu *CpuUsage `protobuf:"bytes,2,opt,name=cpu,proto3" json:"cpu,omitempty"`
	// Memory usage gathered from the container.
	Memory *MemoryUsage `protobuf:"bytes,3,opt,name=memory,proto3" json:"memory,omitempty"`
	// Usage of the writable layer.
	WritableLayer *FilesystemUsage `protobuf:"bytes,4,opt,name=writable_layer,json=writableLayer,proto3" json:"writable_layer,omitempty"`
	// Swap usage gathered from the container.
	Swap                 *SwapUsage `protobuf:"bytes,5,opt,name=swap,proto3" json:"swap,omitempty"`
	XXX_NoUnkeyedLiteral struct{}   `json:"-"`
	XXX_sizecache        int32      `json:"-"`
}

so containerd has ability to get container fs usage ，but why cadvisor not call this ListContainerStats api？

brandond · 2024-02-21T18:49:06Z

I believe that is gated on the PodAndContainerStatsFromCRI FeatureGate, which is still alpha?

Have you tried enabling it on your node?

george-angel · 2024-02-22T08:41:33Z

It looks like that breaks other things: kubernetes/kubernetes#111276

wolgod · 2024-04-29T07:00:31Z

+1 Is there any solution now？

changhyuni · 2024-07-15T05:25:56Z

What happened to the usage metric?
Where should I check?

robini · 2024-07-15T08:12:28Z

Is there any solution to find disk usage metrics for containerd via prometheus ?

bobheadxi mentioned this issue Jan 13, 2021

monitoring: cadvisor observables review sourcegraph/sourcegraph#17239

Merged

andrzej-stencel mentioned this issue Feb 24, 2021

container_fs_usage_bytes metric missing when using containerd runtime SumoLogic/sumologic-kubernetes-collection#1456

Closed

This was referenced Mar 23, 2021

[k8s] Pod metrics is gone when using containerd as runtime aws/amazon-cloudwatch-agent#188

Open

[k8s][containerd] container file system metrics is not supported by cadvisor for contianerd aws/amazon-cloudwatch-agent#192

Open

yyrdl mentioned this issue May 19, 2021

support collecting FsUsageMetrics for containerd #2872

Closed

voelzmo mentioned this issue Jun 16, 2021

Pods with Containerd containers not showing up on cAdvisor metrics #2666

Open

bcressey mentioned this issue Apr 29, 2022

cAdvisor metrics are missing metadata information bottlerocket-os/bottlerocket#867

Closed

brandond mentioned this issue Aug 4, 2022

Migrating off dockershim: document known issue with metrics kubernetes/website#30681

Closed

bobbypage mentioned this issue Sep 14, 2022

kubelet returns diffirent fs metrics for docker and containerd kubernetes/kubernetes#111777

Closed

saurabhvagrawal mentioned this issue Nov 28, 2022

[BUG] Important labels are missing in container file system usage metrics(from CAdvisor) Azure/AKS#3361

Open

brandond mentioned this issue Feb 15, 2023

How to get metrics like "kubelet_volume_stats_used_bytes" and "kubelet_volume_stats_available_bytes" to get PVC usage in k3s cluster? k3s-io/k3s#6962

Closed

sky333999 mentioned this issue Feb 21, 2023

Fix Containerd ContainerFS aws/amazon-cloudwatch-agent#689

Closed

vasireddy99 mentioned this issue Feb 23, 2023

[containerd-cri]Set HasFilesystem to true #3253

Open

zuzzas mentioned this issue Mar 10, 2023

cadvisor with containerd does not support disk metrics deckhouse/deckhouse#4055

Open

2 tasks

AliDatadog mentioned this issue Mar 21, 2023

[Kubelet] Support Ephemeral Percistent volume claim DataDog/integrations-core#14194

Merged

5 tasks

mhausenblas mentioned this issue Mar 27, 2023

Add note for missing containerFS metrics aws-otel/aws-otel.github.io#517

Merged

sidewinder12s mentioned this issue Jun 1, 2023

container_fs metrics reporting root device metrics with containerd #3315

Open

acumino added a commit to acumino/gardener that referenced this issue Aug 4, 2023

Remove Pod file system usage metrics

33d2381

ref - google/cadvisor#2785

acumino added a commit to acumino/gardener that referenced this issue Aug 4, 2023

Remove Pod file system usage metrics

af8d6d7

ref - google/cadvisor#2785

acumino added a commit to acumino/gardener that referenced this issue Aug 8, 2023

Remove Pod file system usage metrics

0b4de71

ref - google/cadvisor#2785

acumino added a commit to acumino/gardener that referenced this issue Aug 9, 2023

Remove Pod file system usage metrics

c540a5b

ref - google/cadvisor#2785

bboreham mentioned this issue Oct 16, 2023

Missing labels in kubelet's cAdvisor metrics kubernetes/kubernetes#89903

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disk usage metrics for containerd #2785

Disk usage metrics for containerd #2785

ribbybibby commented Jan 11, 2021

elcomtik commented Feb 1, 2021

yyrdl commented May 19, 2021

jepio commented Jan 24, 2022

sarbajitc commented Apr 22, 2022

baasumo commented May 2, 2022

fernandesnikhil commented May 2, 2022

snuggie12 commented May 7, 2022

brandond commented May 25, 2022 •

edited

Loading

brandond commented Aug 4, 2022 •

edited

Loading

bobbypage commented Aug 4, 2022 •

edited

Loading

brandond commented Aug 4, 2022

bobbypage commented Aug 4, 2022 •

edited

Loading

brandond commented Aug 4, 2022

bobbypage commented Aug 4, 2022

snuggie12 commented Aug 8, 2022

yvespp commented Oct 10, 2022

george-angel commented Feb 20, 2023 •

edited

Loading

vasireddy99 commented Feb 21, 2023 •

edited

Loading

sethAmazon commented Feb 21, 2023

markrity commented May 4, 2023

sidewinder12s commented May 24, 2023 •

edited

Loading

dragosrosculete commented Aug 10, 2023

smileusd commented Nov 28, 2023

mikkeloscar commented Nov 28, 2023

brandond commented Nov 28, 2023 •

edited

Loading

tripitakav commented Jan 25, 2024

ning1875 commented Feb 21, 2024

brandond commented Feb 21, 2024 •

edited

Loading

george-angel commented Feb 22, 2024

wolgod commented Apr 29, 2024

changhyuni commented Jul 15, 2024

robini commented Jul 15, 2024

Disk usage metrics for containerd #2785

Disk usage metrics for containerd #2785

Comments

ribbybibby commented Jan 11, 2021

elcomtik commented Feb 1, 2021

yyrdl commented May 19, 2021

jepio commented Jan 24, 2022

sarbajitc commented Apr 22, 2022

baasumo commented May 2, 2022

fernandesnikhil commented May 2, 2022

snuggie12 commented May 7, 2022

brandond commented May 25, 2022 • edited Loading

brandond commented Aug 4, 2022 • edited Loading

bobbypage commented Aug 4, 2022 • edited Loading

brandond commented Aug 4, 2022

bobbypage commented Aug 4, 2022 • edited Loading

brandond commented Aug 4, 2022

bobbypage commented Aug 4, 2022

snuggie12 commented Aug 8, 2022

yvespp commented Oct 10, 2022

george-angel commented Feb 20, 2023 • edited Loading

vasireddy99 commented Feb 21, 2023 • edited Loading

sethAmazon commented Feb 21, 2023

markrity commented May 4, 2023

sidewinder12s commented May 24, 2023 • edited Loading

dragosrosculete commented Aug 10, 2023

smileusd commented Nov 28, 2023

mikkeloscar commented Nov 28, 2023

brandond commented Nov 28, 2023 • edited Loading

tripitakav commented Jan 25, 2024

ning1875 commented Feb 21, 2024

brandond commented Feb 21, 2024 • edited Loading

george-angel commented Feb 22, 2024

wolgod commented Apr 29, 2024

changhyuni commented Jul 15, 2024

robini commented Jul 15, 2024

brandond commented May 25, 2022 •

edited

Loading

brandond commented Aug 4, 2022 •

edited

Loading

bobbypage commented Aug 4, 2022 •

edited

Loading

bobbypage commented Aug 4, 2022 •

edited

Loading

george-angel commented Feb 20, 2023 •

edited

Loading

vasireddy99 commented Feb 21, 2023 •

edited

Loading

sidewinder12s commented May 24, 2023 •

edited

Loading

brandond commented Nov 28, 2023 •

edited

Loading

brandond commented Feb 21, 2024 •

edited

Loading