Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Memory Data Volume formula wrong for AMD Zen2 #510

Open
reynozeros opened this issue Jan 19, 2023 · 4 comments
Open

[BUG] Memory Data Volume formula wrong for AMD Zen2 #510

reynozeros opened this issue Jan 19, 2023 · 4 comments
Labels

Comments

@reynozeros
Copy link

For our AMD Zen2 processor (EPYC 7702) the formula to calculate the memory data volume seems to be wrong by a factor of 2.

We have a 2 socket system, meaning 8 NUMA domains.

The formula reads:

Memory data volume [GBytes] 1.0E-09*(DFC0+DFC1)*(4.0/(num_numadomains/num_sockets))*64.0

likwid-bench with stream_mem_avx (using 1 workgroup per L3 NUMA domain, but big enough to not fit in L3) gives:

Data volume (Byte):	8388574445568
MByte/s:		310271.24

While likwid-perfctr -m -g MEM on the benchmark gives:

| Memory bandwidth [MBytes/s] STAT | 154613.7198 |         0 | 19331.0285 | 1207.9197 |
| Memory data volume [GBytes] STAT |   4180.1044 |         0 |   522.6304 |   32.6571 |

For my own benchmarks I just hardcoded that 4.0/(num_numadomains/num_sockets) factor to be 2, which gives coinciding results with likwid-bench. But I still would like it to work out of the box for my colleagues if they install likwid themselves. They might be not that familiar with likwid to notice.

Where do these num_numadomains and num_sockets variables come from? Is this maybe dependent on the actual processor from the Zen2 architecture?

@reynozeros reynozeros added the bug label Jan 19, 2023
@reynozeros reynozeros changed the title [BUG] [BUG] Memory Data Volume formula wrong for AMD Zen2 Jan 19, 2023
@reynozeros
Copy link
Author

reynozeros commented Jan 20, 2023

Today I reinstalled likwid 5.2.2 with spack to see what happens with an out of the bix installation.
Now I get the following error message:

Not all formula entries can be substituted with measured values
Current formula: 1.0E-06*(448273900+443027200)*(4.0/(num_numadomains/num_sockets))*64.0/7.68972
Not all formula entries can be substituted with measured values
Current formula: 1.0E-09*(448273900+443027200)*(4.0/(num_numadomains/num_sockets))*64.0

It seems there are no values for num_numadomains and num_sockets.

Interestingly it seems to work with some applications, but not the one I want to benchmark.

@TomTheBear
Copy link
Member

According to the commit history, the num_numadomains and num_sockets entries have not been touched in the last 2 years. My guess would be that an old library was in LD_LIBRARY_PATH.

@reynozeros
Copy link
Author

reynozeros commented Mar 2, 2023

Even so, sometimes those values are initialized (I suspect it has something to do with how I start my application, something with the environment or so), and when they are, the values are wrong. Our system has 8 memory NUMA domains (each with 16 cores and 4 L3 caches across 2 sockets (each with 64 cores). This would lead to (4.0/(num_numadomains/num_sockets)) = 1. Measurements with likwid-bench showed, that this factor should be 2.

@TomTheBear
Copy link
Member

The factor you see depends on the number of used NUMA domains of a CPU socket. With a single NUMA domain, the numbers are valid [1] . When using 2 NUMA domains [2], you get a factor of two. With 3 NUMA domains [3], you need a factor of 3 and so on and so on [4]. But this is nothing LIKWID can determine for various applications.

AMD mentions in the docs that memory measurements are only supported for NPS1 mode. The "correction factor" was added by me based on observations and is not confirmed by AMD.
Group help mentions:

The metric formulas contain a correction factor of (4.0/(num_numadomains/num_sockets)) to return the value for all 4 memory controllers in NPS1 mode per socket. This is probably a work-around. Requested info from AMD but no answer.

CMD=likwid-perfctr -c M0:1@M1:1@M2:1@M3:1 -g MEM likwid-bench -t stream_mem_avx (no MarkerAPI, ballpark is enough!)

  • [1]
    $CMD -W M0:8GB:6:1:2
    MByte/s:		21242.98
    Memory bandwidth [MBytes/s] STAT | 19566.1670
    
  • [2]
    $CMD -W M0:8GB:6:1:2 -W M1:8GB:6:1:2
    MByte/s:		42388.51
    Memory bandwidth [MBytes/s] STAT | 19621.8265
    
  • [3]
    $CMD -W M0:8GB:6:1:2 -W M1:8GB:6:1:2 -W M2:8GB:6:1:2
    MByte/s:		63479.13
    Memory bandwidth [MBytes/s] STAT | 19613.0676
    
  • [4]
    $CMD -W M0:8GB:6:1:2 -W M1:8GB:6:1:2 -W M2:8GB:6:1:2 -W M3:8GB:6:1:2
    MByte/s:		84467.79
    Memory bandwidth [MBytes/s] STAT | 19346.0861
    

With Zen1, the memory controllers when per NUMA domain, so it could determine the traffic better. With Zen2, AMD switched to socket-local units and this seems to be some problem now. Also the more recent generations provide memory measurements only in NP2 mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants