Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TMA: Ports_Utilized_0 is overestimated for FE-bound tests #422

Open
aayasin opened this issue Mar 20, 2022 · 4 comments
Open

TMA: Ports_Utilized_0 is overestimated for FE-bound tests #422

aayasin opened this issue Mar 20, 2022 · 4 comments
Labels

Comments

@aayasin
Copy link
Collaborator

aayasin commented Mar 20, 2022

The Ports_Utilized_0 node accounts for cases where zero ports are utilized while the bottleneck is Backend_Bound.Core_Bound.
For front-end starved code patterns, this metric overcounts since there would be no uops to execute to start with.
This bug got re-introduced with the fix for Pause-loop in TMA 4.2
Below is a reproducer a kernel for DSB-misses using perf-tools.

$ ./do.py build profile -a dsb-jmp -g "jumpy-seq -a3 -n30 -i 'add %rax,%rcx' JMP" -ki 120e6 -pm 20 -m '+Core_Bound*/6' -v2
building kernel: dsb-jmp ..
/usr/bin/python ./kernels/gen-kernel.py jumpy-seq -a3 -n30 -i 'add %rax,%rcx' JMP > ./kernels/dsb-jmp.c
gcc -O2 -g -o ./kernels/dsb-jmp ./kernels/dsb-jmp.c 2>&1
topdown 2-levels ..
./pmu-tools/toplev.py --no-desc -vl2 --nodes '+CoreIPC,+Instructions,+CORE_CLKS,+CPU_Utilization,+Time,+MUX,+Core_Bound*/6' \
-- taskset 0x4 ./kernels/dsb-jmp 120000000 2>&1 | tee ... 
# 4.3-full-perf on 11th Gen Intel(R) Core(TM) i7-11700B @ 3.20GHz [tgl/icelake]
FE             Frontend_Bound                                                                                % Slots                         72.2    [13.8%]
RET            Retiring                                                                                      % Slots                         27.5  < [18.0%]
Info.Core      CoreIPC                                                                                         Core_Metric                    1.40   [13.8%]
Info.Inst_Mix  Instructions                                                                                    Count              7,408,337,763      [13.8%]
FE             Frontend_Bound.Fetch_Latency                                                                  % Slots                         45.5    [13.8%]<==
FE             Frontend_Bound.Fetch_Bandwidth                                                                % Slots                         26.6  < [13.8%]
RET            Retiring.Light_Operations                                                                     % Slots                         27.4  < [18.0%]
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization                                                    % Clocks                        94.4  < [18.0%]
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0                                   % Clocks                        33.1  < [36.1%]
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_1                                   % Clocks                        60.1  < [32.1%]
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_2                                   % Clocks                         4.2  < [32.1%]
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_3m.ALU_Op_Utilization               % Core_Execution                17.8  < [ 9.0%]
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_3m.ALU_Op_Utilization.Port_0        % Core_Clocks                   12.7  < [ 9.0%]
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_3m.ALU_Op_Utilization.Port_1        % Core_Clocks                   16.9  < [ 9.0%]
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_3m.ALU_Op_Utilization.Port_5        % Core_Clocks                   18.7  < [ 9.0%]
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_3m.ALU_Op_Utilization.Port_6        % Core_Clocks                   22.8  < [ 9.0%]
Info.Thread    IPC                                                                                             Metric                         1.40   [13.8%]
Info.System    CPU_Utilization                                                                                 Metric                         1.00   [13.8%]
Info.System    Time                                                                                            Seconds                        1.11  
Info.Core      CORE_CLKS                                                                                       Count              5,294,650,840      [13.8%]
MUX                                                                                                          %                                9.02  
@andikleen
Copy link
Owner

Okay I assume you will fix that?

@aayasin
Copy link
Collaborator Author

aayasin commented Mar 24, 2022

Yep. Please tag it with 'TMA' as i cannot do it myself

@aayasin
Copy link
Collaborator Author

aayasin commented May 12, 2022

@andikleen reminder on this one

@aayasin aayasin added the TMA label May 12, 2022
@aayasin
Copy link
Collaborator Author

aayasin commented Oct 28, 2022

I tested this with TMA 4.4 on TGL and the issue is still there. I'll try again once 4.5 is released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants