Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100% CPU load on >= linux-4.19 when /proc/acpi/ibm/thermal is used #60

Closed
kitsunyan opened this issue Dec 2, 2018 · 16 comments
Closed
Assignees
Milestone

Comments

@kitsunyan
Copy link

Thinkfan stopped working after 4.18 → 4.19 kernel upgrade. It loads CPU to 100% due to infinite loop.

I'm using simple config without any significant modifications, thus thinkfan reads temperatures from /proc/acpi/ibm/thermals. For some reason, !f.eof() check doesn't work, so it tries to read temperatures infinitely, reading zeros when EOF is reached. I tried to change the condition and it worked just fine:

--- a/src/drivers.cpp
+++ b/src/drivers.cpp
@@ -374,3 +374,3 @@
 	int tmp;
-	while (!f.eof()) {
+	while (!f.eof() && f.tellg() >= 0) {
 		f >> tmp;

Thinkfan 1.0.1, Parabola GNU/Linux-libre, Lenovo ThinkPad X200.

@vmatare
Copy link
Owner

vmatare commented Jan 2, 2019

Can someone with this problem please post the output of:
cat /proc/acpi/ibm/thermal
I'd like to know more about what's going on here.

@vmatare vmatare self-assigned this Jan 2, 2019
@vmatare vmatare added this to the 1.0.2 milestone Jan 2, 2019
@kitsunyan
Copy link
Author

The output is the same as is was before kernel update:
temperatures: 45 31 -128 41 26 -128 24 -128 31 34 -128 -128 -128 -128 -128 -128

@vmatare
Copy link
Owner

vmatare commented Jan 27, 2019

Huh. That is very strange. I'd really like to know what's causing the input to fail there. Can you show me the hex dump with:
od -tx1 /proc/acpi/ibm/thermal

@kitsunyan
Copy link
Author

kitsunyan commented Jan 27, 2019

0000000 74 65 6d 70 65 72 61 74 75 72 65 73 3a 09 33 38
0000020 20 32 36 20 2d 31 32 38 20 33 34 20 32 33 20 2d
0000040 31 32 38 20 32 31 20 2d 31 32 38 20 32 36 20 32
0000060 38 20 2d 31 32 38 20 2d 31 32 38 20 2d 31 32 38
0000100 20 2d 31 32 38 20 2d 31 32 38 20 2d 31 32 38 0a
0000120

@woosh1
Copy link

woosh1 commented Feb 20, 2019

For me, thinkfan shows 100% CPU load (occupying one CPU core only) on a similar configuration (Thinkpad x201, kernel 4.20.10-arch1-1-ARCH), with the latest version of thinkfan from AUR.
Could it be related?
I’ve been using thinkfan for years and have never observed this behavior until now.
Changing the line as suggested above did not resolve the issue.

@woosh1
Copy link

woosh1 commented Feb 20, 2019

Can someone with this problem please post the output of:
cat /proc/acpi/ibm/thermal
I'd like to know more about what's going on here.

Output without thinkfan running:
temperatures: 43 0 0 0 0 0 0 0

And with thinkfan running:
temperatures: 62 0 0 0 0 0 0 0
and rising (I don’t want to wait longer).

The configuration is an unaltered /usr/share/doc/thinkfan/examples/thinkfan.conf.simple copied to /etc/thinkfan.conf.

@helirexi
Copy link

helirexi commented Feb 20, 2019

I was faced same problem, eat 100% of one core, and rise temp up to 70C. Just downgrade to 0.9.3 version "solved" this issue.
My archlinux all up-to-date. I have Lenovo Thinkpad T61 if this can help.
My logs:

Snippet (click to open)

-- Reboot --
Feb 20 21:53:57 t61 systemd[1]: Starting simple and lightweight fan control program...
Feb 20 21:53:57 t61 thinkfan[771]: Daemon PID: 773
Feb 20 21:53:57 t61 systemd[1]: thinkfan.service: Can't open PID file /run/thinkfan.pid (yet?) after start: No such file or directory
Feb 20 21:53:57 t61 systemd[1]: Started simple and lightweight fan control program.
Feb 20 21:53:57 t61 thinkfan[773]: WARNING: Sensor /proc/acpi/ibm/thermal has 16 temperatures, but you have 10 correction values for it.
Feb 20 21:54:57 t61 systemd[1]: Stopping simple and lightweight fan control program...
Feb 20 21:56:27 t61 systemd[1]: thinkfan.service: State 'stop-sigterm' timed out. Killing.
Feb 20 21:56:27 t61 systemd[1]: thinkfan.service: Killing process 773 (thinkfan) with signal SIGKILL.
Feb 20 21:56:27 t61 systemd[1]: thinkfan.service: Main process exited, code=killed, status=9/KILL
Feb 20 21:56:27 t61 systemd[1]: thinkfan.service: Failed with result 'timeout'.
Feb 20 21:56:27 t61 systemd[1]: Stopped simple and lightweight fan control program.
Feb 20 22:43:35 t61 systemd[1]: /usr/lib/systemd/system/thinkfan.service:8: PIDFile= references path below legacy directory /var/run/, updating /var/run/thinkfan.pid → /run/thinkfan.pid; please update the unit file accordingly.
Feb 20 22:43:50 t61 systemd[1]: /usr/lib/systemd/system/thinkfan.service:8: PIDFile= references path below legacy directory /var/run/, updating /var/run/thinkfan.pid → /run/thinkfan.pid; please update the unit file accordingly.
Feb 20 22:44:00 t61 thinkfan[8552]: thinkfan 0.9.1 starting...
Feb 20 22:44:00 t61 systemd[1]: Starting simple and lightweight fan control program...
Feb 20 22:44:00 t61 thinkfan[8552]: Daemon PID: 8553
Feb 20 22:44:00 t61 systemd[1]: Started simple and lightweight fan control program.
Feb 20 22:44:15 t61 thinkfan[8553]: 
                                    Caught deadly signal.
Feb 20 22:44:15 t61 systemd[1]: Stopping simple and lightweight fan control program...
Feb 20 22:44:15 t61 thinkfan[8553]: Cleaning up and resetting fan control.
Feb 20 22:44:15 t61 systemd[1]: thinkfan.service: Succeeded.
Feb 20 22:44:15 t61 systemd[1]: Stopped simple and lightweight fan control program.
-- Reboot --
Feb 20 22:45:04 t61 systemd[1]: Starting simple and lightweight fan control program...
Feb 20 22:45:04 t61 thinkfan[771]: thinkfan 0.9.1 starting...
Feb 20 22:45:04 t61 thinkfan[771]: Daemon PID: 773
Feb 20 22:45:04 t61 systemd[1]: Started simple and lightweight fan control program.
Feb 20 22:49:52 t61 systemd[1]: /usr/lib/systemd/system/thinkfan.service:8: PIDFile= references path below legacy directory /var/run/, updating /var/run/thinkfan.pid → /run/thinkfan.pid; please update the unit file accordingly.
Feb 20 22:52:41 t61 systemd[1]: Stopping simple and lightweight fan control program...
Feb 20 22:52:41 t61 systemd[1]: thinkfan.service: Succeeded.
Feb 20 22:52:41 t61 systemd[1]: Stopped simple and lightweight fan control program.
-- Reboot --
Feb 20 22:53:30 t61 systemd[1]: Starting simple and lightweight fan control program...
Feb 20 22:53:30 t61 thinkfan[775]: thinkfan 0.9.1 starting...
Feb 20 22:53:30 t61 thinkfan[775]: Daemon PID: 777
Feb 20 22:53:30 t61 systemd[1]: Started simple and lightweight fan control program.



Can someone with this problem please post the output of:
cat /proc/acpi/ibm/thermal
I'd like to know more about what's going on here.

Yep, here is:
with old version, sorry

$ cat /proc/acpi/ibm/thermal 
temperatures:	48 41 37 52 34 -128 33 -128 41 44 49 -128 -128 -128 -128 -128

uname -a
Linux t61 4.20.10-arch1-1-ARCH #1 SMP PREEMPT Fri Feb 15 17:49:06 UTC 2019 x86_64 GNU/Linux

Thanks, I was use thinkfan many many years, hope you fix this soon ))

UPD.

$ od -tx1 /proc/acpi/ibm/thermal
0000000 74 65 6d 70 65 72 61 74 75 72 65 73 3a 09 34 39
0000020 20 34 31 20 33 37 20 35 32 20 33 34 20 2d 31 32
0000040 38 20 33 33 20 2d 31 32 38 20 34 31 20 34 34 20
0000060 34 38 20 2d 31 32 38 20 2d 31 32 38 20 2d 31 32
0000100 38 20 2d 31 32 38 20 2d 31 32 38 0a
0000114

@vmatare
Copy link
Owner

vmatare commented Feb 21, 2019

OK, so I dug out an old system that still has /proc/acpi/ibm/thermal and reproduced the issue. Looks nasty & strange. Unfortunately I don't have much time right now, so it might take a few more days, sorry :-/

@vmatare vmatare removed the dangerous label Feb 21, 2019
@vmatare
Copy link
Owner

vmatare commented Feb 21, 2019

Fortunately thinkfan freezes before any fan speed is set, so a least it won't overheat your CPU ;-)

@vmatare vmatare changed the title 100% CPU load on Linux 4.19 100% CPU load on Linux >= 4.19 when /proc/acpi/ibm/thermal is used Feb 21, 2019
@vmatare vmatare changed the title 100% CPU load on Linux >= 4.19 when /proc/acpi/ibm/thermal is used 100% CPU load on >= linux-4.19 when /proc/acpi/ibm/thermal is used Feb 21, 2019
@vmatare
Copy link
Owner

vmatare commented Feb 21, 2019

Ah well, I was looking at a portion of the code that already dealt with that newline char that suddenly popped up in /proc/acpi/ibm/thermal after linux 4.19, so I got confused. Turns out that in read_temps() I had simply forgotten to check for f.fail(), i.e. whether we successfully read a number from it. Of course, with that newline, we stop being able to read numbers, but f.eof() isn't actually true, yet. So yeah. Please test with the updated master and let me know whether the issue is fixed for you ;-)

@kubriel
Copy link

kubriel commented Feb 22, 2019

not fixed :(

@woosh1
Copy link

woosh1 commented Feb 22, 2019

The Archlinux AUR package “thinkfan” uses “thinkfan-1.0.1.tar.gz”: there the error is not fixed.
The Archlinux AUR package “thinkfan-git” uses “git+https://github.com/vmatare/thinkfan.git”: there, it is fixed on my system (ThinkPad X201).

@vmatare
Copy link
Owner

vmatare commented Feb 22, 2019

Yeah, I'm pretty sure it should be fixed in the master now. @kubriel and possibly others: If thinkfan still freezes for you, please do a debug build:

mkdir build-dbg && cd build-dbg
cmake -DCMAKE_BUILD_TYPE=Debug ..
make -j`nproc`

Run the debuggable binary on the terminal

./thinkfan -n

And then go to another terminal to send a SIGSEGV to thinkfan and trigger a crash:

pkill -segv thinkfan

Thinkfan should print a backtrace as it crashes. Please post that here.

@superherointj
Copy link

On my Thinkpad X220, Archlinux. AUR package “thinkfan” is buggy. And “thinkfan-git” (AUR) is working/fixed. Thanks!

@bssb
Copy link

bssb commented Feb 27, 2019

@superherointj I'm the maintainer of the thinkfan AUR package. I'm just waiting for an official version containing the fix to be released, and I will update the package accordingly. thinkfan-git is working because it pulls the latest changes from the master branch.

@vmatare
Copy link
Owner

vmatare commented Feb 27, 2019

OK great, then I'll do a bugfix release.

@vmatare vmatare closed this as completed Mar 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants