Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add code measuring CPU frequency #125

Open
Bulat-Ziganshin opened this issue May 7, 2020 · 5 comments
Open

Add code measuring CPU frequency #125

Bulat-Ziganshin opened this issue May 7, 2020 · 5 comments

Comments

@Bulat-Ziganshin
Copy link

Bulat-Ziganshin commented May 7, 2020

I just wrote a little snippet measuring actual frequency of CPU core performing this code: https://encode.su/threads/3389-Code-snippet-to-compute-CPU-frequency

Please consider using it to correctly compute number of CPU cycles spent by hash functions - instead of RDTSC whose fakeness was discussed here a few years ago.

@rurban
Copy link
Owner

rurban commented May 8, 2020

Nice. Just we already have better measurements than gettimeofday

And on Linux you can just ask the kernel. It deviates constantly btw.

@erthink
Copy link
Contributor

erthink commented May 8, 2020

As I wrote earlier, seems that the best code for measuring up to clock cycles inside the t1ha benchmark.

It supports x86, arm64, ppc64, s390x, e2k, ia64, etc, as well as perf_event, emscripten_get_now(), mach_absolute_time(), QueryPerformanceCounter(), read_wall_time(), clock_gettime(), gethrtime() and gettimeofday() (i.e. more than google-benchmark).
For instance see logs on Travis-CI.

I was planning to rearrange this code as a separate "mera" library, but I don't have time for this yet.
Therefore, reusing this code is not as convenient as we would like.
However, it is worth mentioning in this context.


PPC64:

Preparing to benchmarking...
 - running on CPU#10
 - use MFSPR(268) as clock source for benchmarking
 - assume it cheap and stable
 - measure granularity and overhead: 6 cycles, 0.166667 iteration/cycle

ARM64:

Preparing to benchmarking...
 - running on CPU#30
 - use CNTVCT_EL0 as clock source for benchmarking
 - assume it cheap and stable
 - measure granularity and overhead: 0.2 tick, 5 iterations/tick

x390s

Preparing to benchmarking...
 - running on CPU#3
 - use STCKE as clock source for benchmarking
 - assume it cheap and stable
 - measure granularity and overhead: 6 cycles, 0.166667 iteration/cycle

AMD64:

Preparing to benchmarking...
 - perf_event_open(): No such file or directory
 - running on CPU#0
 - use RDTSCP as clock source for benchmarking
 - assume it cheap and floating (RESULTS MAY VARY AND BE USELESS)
 - measure granularity and overhead: 38 cycles, 0.0263158 iteration/cycle

@Bulat-Ziganshin
Copy link
Author

Bulat-Ziganshin commented May 8, 2020

It seems that you both say about measuring time intervals, while the code I provided is about measuring effective CPU frequency - using any abovementioned way to measure the time interval.

My point is that using rdtsc to count CPU cycles is broken for about 10 years, because it reports cycles of fixed base frequency (such as 2 GHz in reports provided in encode.su thread). So, instead I wrote small code for which we know how much CPU cycles it will be executed, and by measuring time spent on it, we can easily compute the frequency. Moreover, the method works for almost any supersclalar CPU.

Using this approach, we can finally correctly report how much CPU cycles spent for each hashing operation.

@rurban
Copy link
Owner

rurban commented May 8, 2020

Yes, I know these loop counting tricks from gamers to calculate the frame rate. It's a rather stable way to do it. I'll check if rtdsc with cpuid is better or worse.

But "better" would be reading the freq from the kernel via proc.

rurban added a commit that referenced this issue Oct 1, 2020
not hardcoded to 3 GHz.
Some code is based on GH #125, but this result is not really good.
On linux I found an easy way.
rurban added a commit that referenced this issue Oct 1, 2020
not hardcoded to 3 GHz.
Some code is based on GH #125, but this result is not really good.
On linux I found an easy way.
rurban added a commit that referenced this issue Oct 1, 2020
not hardcoded to 3 GHz.
Some code is based on GH #125, but this result is not really good.
On linux I found an easy way.
rurban added a commit that referenced this issue Nov 26, 2020
not hardcoded to 3 GHz.
Some code is based on GH #125, but this result is not really good.
On linux I found an easy way.
rurban added a commit that referenced this issue Nov 28, 2020
not hardcoded to 3 GHz.
Some code is based on GH #125, but this result is not really good.
On linux I found an easy way.
rurban added a commit that referenced this issue Jan 21, 2021
not hardcoded to 3 GHz.
Some code is based on GH #125, but this result is not really good.
On linux I found an easy way.
rurban added a commit that referenced this issue Nov 19, 2021
not hardcoded to 3 GHz.
Some code is based on GH #125, but this result is not really good.
On linux I found an easy way.
rurban added a commit that referenced this issue Jan 27, 2022
not hardcoded to 3 GHz.
Some code is based on GH #125, but this result is not really good.
On linux I found an easy way.
rurban added a commit that referenced this issue Apr 2, 2022
not hardcoded to 3 GHz.
Some code is based on GH #125, but this result is not really good.
On linux I found an easy way.
@YellowOnion
Copy link

But "better" would be reading the freq from the kernel via proc.

Switching frequency on a modern core is usually in microseconds, AMD's Precision boost is pretty crazy, my CPU will be anywhere between 4.5 and 5.1GHz with single core boost, constantly changing due power demand etc, I kinda doubt you can get accurate readings through anything non-atomic with the execution of the code.

Real world time is also important especially when older Intel's AVX512 will clock a system down below "base" (Zen 4 doesn't have this penalty), potentially hiding some of the performance penalty because a user might think 30 cycles at 2GHz is better than 40cycles at 3GHz.

There's also other things to consider, I'm pretty sure some AVX units can take upwards of 200 cycles just to turn on, which might not be measured here if the unit is already hot.

https://blog.cloudflare.com/on-the-dangers-of-intels-frequency-scaling/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants