Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix nil referencing issue in cadvisor during metric decoration #206

Merged
merged 1 commit into from
May 3, 2024

Conversation

movence
Copy link

@movence movence commented Apr 25, 2024

Description:
cadvisor is throwing an error while decorating metrics with k8s stores. The issue caused the agent pods to go into CrashLoop by throwing error in the agent logs:

goroutine 1076 [running]:
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awscontainerinsightreceiver/internal/cadvisor.(*Cadvisor).GetMetrics(0xc001a80180)
	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awscontainerinsightreceiver@v0.89.0/internal/cadvisor/cadvisor_linux.go:231 +0x461
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awscontainerinsightreceiver.(*awsContainerInsightReceiver).collectData(0xc000b23d40, {0x4c3b158, 0xc000485ef0})
	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awscontainerinsightreceiver@v0.89.0/receiver.go:351 +0x83
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awscontainerinsightreceiver.(*awsContainerInsightReceiver).Start.func1()
	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awscontainerinsightreceiver@v0.89.0/receiver.go:172 +0xe5
created by github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awscontainerinsightreceiver.(*awsContainerInsightReceiver).Start in goroutine 1
	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awscontainerinsightreceiver@v0.89.0/receiver.go:158 +0x113f

Testing:
Tested on a cluster that was exhibiting the issue.

Documentation:

@@ -228,6 +228,9 @@ func (c *Cadvisor) GetMetrics() []pmetric.Metrics {
}

for _, cadvisorMetric := range results {
if cadvisorMetric == nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we including nils in the result slice in the first place?

Copy link
Author

@movence movence May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happens when pod information is not found in store (PodStore or ServiceStore) cache while decorating metrics with k8s metadata or service. It is unknown why corresponding pod information is missing for a metric, but this adds a safe guard around this edge case.

@movence movence merged commit 5aaf26f into amazon-contributing:aws-cwa-dev May 3, 2024
114 of 122 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants