Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TMA: Pause-loop is not classified at all levels #339

Closed
aayasin opened this issue Oct 12, 2020 · 3 comments
Closed

TMA: Pause-loop is not classified at all levels #339

aayasin opened this issue Oct 12, 2020 · 3 comments
Labels

Comments

@aayasin
Copy link
Collaborator

aayasin commented Oct 12, 2020

A rather simple pause-loop kernel is classified properly as Core Bound at levels 1 & 2 and so in levels 5 and 6, but not at the mid-levels 4 and 5.
I am documenting this here and will address it in TMA 4.2 release.
Here is a reproducer with perf-tools.

P.S. @andikleen: This is a CFL machine (8th gen Core). Can that be reflected instead of [skl] in 1st line of toplev output?

$ ./kernels/gen-kernel.py -i pause > ./kernels/pause3x.c
$ gcc -g -O2 -o ./kernels/pause3x ./kernels/pause3x.c

$ ./pmu-tools/toplev.py --no-desc --no-perf --nodes '+CoreIPC,+UPI,+Time,+MUX' -vl6 -- ./kernels/pause3x 10000000  2>&1 | egrep -v ' [10]\.. '
# 4.11-full-perf on Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz [skl]
BE             Backend_Bound                                                                                 % Slots                      98.9
BE/Core        Backend_Bound.Core_Bound                                                                      % Slots                      98.8   <==
FE             Frontend_Bound.Fetch_Latency.MS_Switches                                                      % Clocks                      2.6 <
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization                                                    % Clocks                      9.9 <
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0                                   % Clocks                      8.5 <
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0.Serializing_Operation             % Clocks                     98.9 <
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0.Serializing_Operation.Slow_Pause  % Clocks                     91.3 <
Info.Thread    UPI                                                                                             Metric                      2.9
RET            Retiring.Light_Operations.Other                                                               % Uops                      100.0 <
MUX                                                                                                          %                             2.2
Using level 6.

$ lscpu
Architecture:         x86_64
CPU op-mode(s):       32-bit, 64-bit
Byte Order:           Little Endian
CPU(s):               12
On-line CPU(s) list:  0-5
Off-line CPU(s) list: 6-11
Thread(s) per core:   1
Core(s) per socket:   6
Socket(s):            1
NUMA node(s):         1
Vendor ID:            GenuineIntel
CPU family:           6
Model:                158
Model name:           Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
Stepping:             10
CPU MHz:              3701.500
CPU max MHz:          3700.0000
CPU min MHz:          800.0000
BogoMIPS:             7399.70
Virtualization:       VT-x
L1d cache:            32K
L1i cache:            32K
L2 cache:             256K
L3 cache:             12288K
@andikleen
Copy link
Owner

I fixed the reporting for CFL/KBL (but nothing else atm)

@aayasin
Copy link
Collaborator Author

aayasin commented Mar 18, 2021

With TMA 4.2 release, I just verified this issue is indeed resolved on a CLX system.
This entry can be closed.

[labuser@ssp-wpclx-cdi276 perf-tools]$ python2.7 ./do.py build profile -g "-i PAUSE -n 3" -a pause3x -ki 1e8 --profile-mask 0x40 -v1 --pmu-tools '/bin/python2.7 ./pmu-tools/'                                 
building kernel: pause3x ..                                                                                                                                                                                    
./kernels/gen-kernel.py -i PAUSE -n 3 > ./kernels/pause3x.c 2>&1                                                                                                                                               
gcc -g -O2 -o ./kernels/pause3x ./kernels/pause3x.c 2>&1                                                                                                                                                       
topdown auto-drilldown ..                                                                                                                                                                                      
/bin/python2.7 ./pmu-tools//toplev.py --no-desc  --drilldown --nodes '+CoreIPC,+Instructions,+CORE_CLKS,+CPU_Utilization,+Time,+MUX,+IpTB,+L2MPKI' -V pause3x-1e8.toplev--drilldown-perf.csv --metric-group +Summary,+HPC -- taskset 0x4 ./kernels/pause3x 100000000  2>&1 | tee pause3x-1e8.toplev--drilldown.log | egrep -v "^(Run toplev|Adding|Using)"                                                                    
2 events not supported                                                                                                                                                                                         
# 4.19-full-perf on Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz [clx/skylake]                                                                                                                                
BE             Backend_Bound    % Slots                   95.9  <==                                                                                                                                            
Info.Core      CoreIPC            CoreMetric               0.1                                                                                                                                                 
Info.Inst_Mix  Instructions       Count          621,945,279.0                                                                                                                                                 
Info.Thread    IPC                Metric                   0.1                                                                                                                                                 
Info.System    CPU_Utilization    Metric                   1.0                                                                                                                                                 
Info.System    Time               Seconds                  3.3                                                                                                                                                 
Info.Thread    IpTB               Metric                   6.1                                                                                                                                                 
Info.Core      CORE_CLKS          Count       12,281,209,061.0                                                                                                                                                 
Info.Memory    L2MPKI             Metric                   0.1                                                                                                                                                 
MUX                             %                          9.2                                                                                                                                                 
Rerunning workload                                                                                                                                                                                             
BE             Backend_Bound             % Slots                   95.9                                                                                                                                        
Info.Core      CoreIPC                     CoreMetric               0.0                                                                                                                                        
Info.Inst_Mix  Instructions                Count          609,719,766.0                                                                                                                                        
BE/Core        Backend_Bound.Core_Bound  % Slots                   95.9  <==                                                                                                                                   
Info.Thread    IpTB                        Metric                   6.0                                                                                                                                        
Info.Core      CORE_CLKS                   Count       12,241,658,267.0                                                                                                                                        
Info.Memory    L2MPKI                      Metric                   0.0                                                                                                                                        
Info.System    CPU_Utilization             Metric                   1.0                                                                                                                                        
Info.System    Time                        Seconds                  3.3                                                                                                                                        
MUX                                      %                         18.3                                                                                                                                        
Rerunning workload                                                                                                                                                                                             
BE             Backend_Bound                               % Slots                   95.9                                                                                                                      
Info.Core      CoreIPC                                       CoreMetric               0.1                                                                                                                      
Info.Inst_Mix  Instructions                                  Count          611,825,162.0                                                                                                                      
BE/Core        Backend_Bound.Core_Bound                    % Slots                   95.9                                                                                                                      
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization  % Clocks                  95.2  <==
Info.Thread    IpTB                                          Metric                   6.1
Info.Core      CORE_CLKS                                     Count       12,197,285,099.0
Info.Memory    L2MPKI                                        Metric                   0.1
Info.System    CPU_Utilization                               Metric                   1.0
Info.System    Time                                          Seconds                  3.3
MUX                                                        %                         12.2
Rerunning workload
BE             Backend_Bound                                                % Slots                   95.9
Info.Core      CoreIPC                                                        CoreMetric               0.1
Info.Inst_Mix  Instructions                                                   Count          615,664,622.0
BE/Core        Backend_Bound.Core_Bound                                     % Slots                   95.9
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization                   % Clocks                  95.1
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0  % Clocks                  91.0  <==
Info.Thread    IpTB                                                           Metric                   6.1
Info.Core      CORE_CLKS                                                      Count       12,247,617,237.0
Info.Memory    L2MPKI                                                         Metric                   0.1
Info.System    CPU_Utilization                                                Metric                   1.0
Info.System    Time                                                           Seconds                  3.3
MUX                                                                         %                         12.2
Rerunning workload
BE             Backend_Bound                                                                      % Slots                   95.9
Info.Core      CoreIPC                                                                              CoreMetric               0.1
Info.Inst_Mix  Instructions                                                                         Count          613,116,500.0
BE/Core        Backend_Bound.Core_Bound                                                           % Slots                   95.9
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization                                         % Clocks                  95.5
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0                        % Clocks                  91.0
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0.Serializing_Operation  % Clocks                  95.9  <==
Info.Thread    IpTB                                                                                 Metric                   6.1
Info.Core      CORE_CLKS                                                                            Count       12,216,356,878.0
Info.Memory    L2MPKI                                                                               Metric                   0.1
Info.System    CPU_Utilization                                                                      Metric                   1.0
Info.System    Time                                                                                 Seconds                  3.3
MUX                                                                                               %                         12.2
Rerunning workload
BE             Backend_Bound                                                                                 % Slots                   95.9
Info.Core      CoreIPC                                                                                         CoreMetric               0.1
Info.Inst_Mix  Instructions                                                                                    Count          616,309,280.0
BE/Core        Backend_Bound.Core_Bound                                                                      % Slots                   95.9
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization                                                    % Clocks                  94.9
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0                                   % Clocks                  91.0
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0.Serializing_Operation             % Clocks                  95.9
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0.Serializing_Operation.Slow_Pause  % Clocks                  97.7  <==
Info.Thread    IpTB                                                                                            Metric                   6.1
Info.Core      CORE_CLKS                                                                                       Count       12,256,754,613.0
Info.Memory    L2MPKI                                                                                          Metric                   0.1
Info.System    CPU_Utilization                                                                                 Metric                   1.0
Info.System    Time                                                                                            Seconds                  3.3
MUX

@andikleen
Copy link
Owner

Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants