Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] memory counters are not reported when accessing msr registers via likwid-accessD #412

Closed
tamiko opened this issue Jul 3, 2021 · 4 comments
Labels

Comments

@tamiko
Copy link

tamiko commented Jul 3, 2021

Memory counters are not reported when accessing msr registers via likwid-accessD

I have set up

  • enable suid root (chmod u+s) on likwid-accessD
  • set ap_sys_rawio=ep capability for likwid-accessD

I do see some performance counters getting reported correctly, however when I run as user I get the following output (shortened):

$ likwid-perfctr -verbose 1 -g MEM_DP -C 0 hostname

DEBUG - [access_client_startDaemon:197] Successfully opened socket /tmp/likwid-1636476 to daemon for CPU 0g
Executing: hostnameg
DEBUG - [perfmon_addEventSet:2130] Currently 1 groups of 2 activeg
DEBUG - [perfgroup_readGroup:871] Reading group MEM_DP from /usr/share/likwid/perfgroups/skylakeX/MEM_DP.txtg
DEBUG - [perfmon_addEventSet:2309] Added event INSTR_RETIRED_ANY for counter FIXC0 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event CPU_CLK_UNHALTED_CORE for counter FIXC1 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event CPU_CLK_UNHALTED_REF for counter FIXC2 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event PWR_PKG_ENERGY for counter PWR0 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event PWR_DRAM_ENERGY for counter PWR3 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE for counter PMC0 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event FP_ARITH_INST_RETIRED_SCALAR_DOUBLE for counter PMC1 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE for counter PMC2 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE for counter PMC3 to group 0g
DEBUG - [perfmon_addEventSet:2240] Cannot access counter register MBOX0C0g
DEBUG - [perfmon_addEventSet:2240] Cannot access counter register MBOX0C1g
DEBUG - [perfmon_addEventSet:2240] Cannot access counter register MBOX1C0g
DEBUG - [perfmon_addEventSet:2240] Cannot access counter register MBOX1C1g
DEBUG - [perfmon_addEventSet:2240] Cannot access counter register MBOX2C0g
DEBUG - [perfmon_addEventSet:2240] Cannot access counter register MBOX2C1g
DEBUG - [perfmon_addEventSet:2240] Cannot access counter register MBOX3C0g
DEBUG - [perfmon_addEventSet:2240] Cannot access counter register MBOX3C1g
DEBUG - [perfmon_addEventSet:2240] Cannot access counter register MBOX4C0g
DEBUG - [perfmon_addEventSet:2240] Cannot access counter register MBOX4C1g
DEBUG - [perfmon_addEventSet:2240] Cannot access counter register MBOX5C0g
DEBUG - [perfmon_addEventSet:2240] Cannot access counter register MBOX5C1g
Warning: Counter PMC3 cannot be used if Restricted Transactional Memory feature is enabled andg
         bit 0 of register TSX_FORCE_ABORT is 0. As workaround enableg
         allow_tsx_force_abort in /sys/devices/cpu/g

+-----------------------------------+--------------+
|               Metric              |  HWThread 0  |
+-----------------------------------+--------------+
|        Runtime (RDTSC) [s]        |       8.2000 |
|        Runtime unhalted [s]       |       0.0002 |
|            Clock [MHz]            |    3075.5075 |
|                CPI                |       1.6783 |
|             Energy [J]            |     142.0850 |
|             Power [W]             |      17.3275 |
|          Energy DRAM [J]          |       3.8873 |
|           Power DRAM [W]          |       0.4741 |
|            DP [MFLOP/s]           | 3.170741e-06 |
|          AVX DP [MFLOP/s]         |            0 |
|          Packed [MUOPS/s]         |            0 |
|          Scalar [MUOPS/s]         | 3.170741e-06 |
|  Memory read bandwidth [MBytes/s] |            0 |
|  Memory read data volume [GBytes] |            0 |
| Memory write bandwidth [MBytes/s] |            0 |
| Memory write data volume [GBytes] |            0 |
|    Memory bandwidth [MBytes/s]    |            0 |
|    Memory data volume [GBytes]    |            0 |
|       Operational intensity       |      inf     |
+-----------------------------------+--------------+

Running the same command as root user gives:

$ likwid-perfctr -verbose 1 -g MEM_DP -C 0 hostname

DEBUG - [access_client_startDaemon:197] Successfully opened socket /tmp/likwid-1637065 to daemon for CPU 0g
Executing: hostnameg
DEBUG - [perfmon_addEventSet:2130] Currently 1 groups of 2 activeg
DEBUG - [perfgroup_readGroup:871] Reading group MEM_DP from /usr/share/likwid/perfgroups/skylakeX/MEM_DP.txtg
DEBUG - [perfmon_addEventSet:2309] Added event INSTR_RETIRED_ANY for counter FIXC0 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event CPU_CLK_UNHALTED_CORE for counter FIXC1 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event CPU_CLK_UNHALTED_REF for counter FIXC2 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event PWR_PKG_ENERGY for counter PWR0 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event PWR_DRAM_ENERGY for counter PWR3 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE for counter PMC0 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event FP_ARITH_INST_RETIRED_SCALAR_DOUBLE for counter PMC1 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE for counter PMC2 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE for counter PMC3 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event CAS_COUNT_RD for counter MBOX0C0 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event CAS_COUNT_WR for counter MBOX0C1 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event CAS_COUNT_RD for counter MBOX1C0 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event CAS_COUNT_WR for counter MBOX1C1 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event CAS_COUNT_RD for counter MBOX2C0 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event CAS_COUNT_WR for counter MBOX2C1 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event CAS_COUNT_RD for counter MBOX3C0 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event CAS_COUNT_WR for counter MBOX3C1 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event CAS_COUNT_RD for counter MBOX4C0 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event CAS_COUNT_WR for counter MBOX4C1 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event CAS_COUNT_RD for counter MBOX5C0 to group 0g
DEBUG - [perfmon_addEventSet:2309] Added event CAS_COUNT_WR for counter MBOX5C1 to group 0g
Warning: Counter PMC3 cannot be used if Restricted Transactional Memory feature is enabled andg
         bit 0 of register TSX_FORCE_ABORT is 0. As workaround enableg
         allow_tsx_force_abort in /sys/devices/cpu/g

+-----------------------------------+------------+
|               Metric              | HWThread 0 |
+-----------------------------------+------------+
|        Runtime (RDTSC) [s]        |     0.0006 |
|        Runtime unhalted [s]       |     0.0002 |
|            Clock [MHz]            |  3495.8568 |
|                CPI                |     1.6064 |
|             Energy [J]            |     0.0459 |
|             Power [W]             |    72.9954 |
|          Energy DRAM [J]          |     0.0067 |
|           Power DRAM [W]          |    10.6090 |
|            DP [MFLOP/s]           |     0.0159 |
|          AVX DP [MFLOP/s]         |          0 |
|          Packed [MUOPS/s]         |          0 |
|          Scalar [MUOPS/s]         |     0.0159 |
|  Memory read bandwidth [MBytes/s] |   129.3669 |
|  Memory read data volume [GBytes] |     0.0001 |
| Memory write bandwidth [MBytes/s] |   112.9797 |
| Memory write data volume [GBytes] |     0.0001 |
|    Memory bandwidth [MBytes/s]    |   242.3466 |
|    Memory data volume [GBytes]    |     0.0002 |
|       Operational intensity       |     0.0001 |
+-----------------------------------+------------+

This is likwid Version 5.1.0 (commit: 233ab943543480cd46058b34616c174198ba0459 running on Debian Bullseye (Linux kernel version 5.10.40-1).

What am I missing?

Update: when I additionally grant the CAP_SYS_ADMIN capability to likwid-accessD then everything works as expected.

@tamiko tamiko added the bug label Jul 3, 2021
@TomTheBear
Copy link
Member

Hi, thanks for the detailed report. It is new to me, that the accessdaemon requires CAP_SYS_ADMIN additional to the suid-root bit. It might be some security restriction in more recent kernels. I know of one for the MSR access but in your case, it's the PCI access. I have to investigate that.

Do I understand your update correctly that user and root runs work now with CAP_SYS_ADMIN for likwid-accessD?

@tamiko
Copy link
Author

tamiko commented Jul 5, 2021

Do I understand your update correctly that user and root runs work now with CAP_SYS_ADMIN for likwid-accessD?

Yes. User (non-root) access works provided that likwid-accessD is suid-root and has both capabilities cap_sys_admin,cap_sys_rawio=ep.

Changing any of that (i.e., removing suid-root but granting file permissions; or removing the cap_sys_admin capability) and PCI access fails.

@TomTheBear
Copy link
Member

Thanks for the clarification. I will update the documentation.

@TomTheBear
Copy link
Member

64aad2f

I also added it on some wiki pages

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants