Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpufreq collector fails when any core is offline #2577

Closed
kitzmiller opened this issue Jan 17, 2023 · 4 comments
Closed

cpufreq collector fails when any core is offline #2577

kitzmiller opened this issue Jan 17, 2023 · 4 comments

Comments

@kitzmiller
Copy link

Host operating system: output of uname -a

Linux myhost 6.1.5-060105-generic #202301121238 SMP PREEMPT_DYNAMIC Thu Jan 12 13:10:27 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

  node_exporter, version 1.5.0 (branch: HEAD, revision: 1b48970ffcf5630534fb00bb0687d73c66d1c959)
  build user:       root@6e7732a7b81b
  build date:       20221129-18:59:09
  go version:       go1.19.3
  platform:         linux/amd64

same behavior on

node_exporter, version 1.3.1 (branch: debian/sid, revision: 1.3.1-1)
  build user:       team+pkg-go@tracker.debian.org
  build date:       20220114-23:26:34
  go version:       go1.17.3
  platform:         linux/amd64

node_exporter command line flags

none

node_exporter log output

Jan 17 07:35:31 myhost prometheus-node-exporter[1028]: ts=2023-01-17T12:35:31.485Z caller=collector.go:169 level=error msg="collector failed" name=cpufreq duration_seconds=0.001987722 err="read /sys/devices/system/cpu/cpu15/cpufreq/cpuinfo_max_freq: device or resource busy"

Are you running node_exporter in Docker?

no

What did you do that produced an error?

I disabled a CPU core with:

echo 0 > /sys/devices/system/cpu/cpu15/online

What did you expect to see?

I expected the metrics node_cpu_frequency_max_hertz, node_cpu_frequency_min_hertz, node_cpu_scaling_frequency_hertz, node_cpu_scaling_frequency_max_hertz, node_cpu_scaling_frequency_min_hertz for the remaining online cores to still be available.

What did you see instead?

The above metrics were not present when any core is disabled. Reenabling the core reenables the metrics. The error above is added to /var/log/syslog every minute.

@taherkk
Copy link

taherkk commented Mar 3, 2023

Hi @discordianfish ,

I want to contribute to this issue.

I have added a check to ensure the CPU is online before reading the frequency files (except cpu0) in systems_cpu.go under the "github.com/prometheus/procfs/sysfs" package.

This has solved the issue. I am new to contributing to open source.
Can someone guide me through the next steps?

@discordianfish
Copy link
Member

Maybe #2605 fixed that issue for you as well? Try the current version in master

@taherkk
Copy link

taherkk commented Mar 8, 2023

No it didn't. Since this change resolved the bug in collector and not procfs.
I found a better way to find offline cpus though using "/sys/devices/system/cpu/offline".

@taherkk
Copy link

taherkk commented Mar 31, 2023

We can close this issue it should be resolved by prometheus/procfs#497

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants