Skip to content

Commit

Permalink
README: update
Browse files Browse the repository at this point in the history
Signed-off-by: Chen Pei <cp0613@linux.alibaba.com>
  • Loading branch information
cp0613 committed May 9, 2024
1 parent c8fce70 commit d798dbe
Showing 1 changed file with 109 additions and 14 deletions.
123 changes: 109 additions & 14 deletions README.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -206,35 +206,64 @@ https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Document

vmlinux: The Linux kernel ELF file

= Appendix A
= Appendix A - PMU in DTS

Additional DTS examples(pmu, serial, bootargs with initrd):
The configuration of PMU can be referred to link:https://github.com/riscv-software-src/opensbi/blob/master/docs/pmu_support.md[OpenSBI SBI PMU extension]

The following is an example of PMU configuration for the Xuantie C-series CPU, which may need to be modified according to the datasheet during actual use.
....
pmu {
compatible = "riscv,pmu";
riscv,event-to-mhpmevent =
/* For Cycle sampling */
/* <0x00001 0x00000000 0x000000??>, */
/* For Instruction sampling */
/* <0x00002 0x00000000 0x000000??>, */
/* PMU_HW_CACHE_REFERENCES -> ll_cache_read_access */
<0x00003 0x00000000 0x00000010>,
/* PMU_HW_CACHE_MISSES -> ll_cache_read_miss */
<0x00004 0x00000000 0x00000011>,
/* PMU_HW_BRANCH_INSTRUCTIONS -> inst_branch */
<0x00005 0x00000000 0x00000007>,
/* PMU_HW_BRANCH_MISSES -> inst_branch_mispredict */
<0x00006 0x00000000 0x00000006>,
/* PMU_HW_STALLED_CYCLES_FRONTEND -> ifu_stalled_cycle */
<0x00008 0x00000000 0x00000027>,
/* PMU_HW_STALLED_CYCLES_BACKEND -> idu_stalled_cycle */
<0x00009 0x00000000 0x00000028>,
/* L1D_READ_ACCESS -> l1_dcache_read_access */
<0x10000 0x00000000 0x0000000c>,
/* L1D_READ_MISS -> l1_dcache_read_miss */
<0x10001 0x00000000 0x0000000d>,
/* L1D_WRITE_ACCESS -> l1_dcache_write_access */
<0x10002 0x00000000 0x0000000e>,
/* L1D_WRITE_MISS -> l1_dcache_write_miss */
<0x10003 0x00000000 0x0000000f>,
/* L1I_READ_ACCESS -> l1_icache_access */
<0x10008 0x00000000 0x00000001>,
/* L1I_READ_MISS -> l1_icache_miss */
<0x10009 0x00000000 0x00000002>,
/* LL_READ_ACCESS -> ll_cache_read_access */
<0x10010 0x00000000 0x00000010>,
/* LL_READ_MISS -> ll_cache_read_miss */
<0x10011 0x00000000 0x00000011>,
/* LL_WRITE_ACCESS -> ll_cache_write_access */
<0x10012 0x00000000 0x00000012>,
/* LL_WRITE_MISS -> ll_cache_write_miss */
<0x10013 0x00000000 0x00000013>,
/* DTLB_READ_MISS -> dtlb_miss */
<0x10019 0x00000000 0x00000004>,
/* ITLB_READ_MISS -> itlb_miss */
<0x10021 0x00000000 0x00000003>,
/* BPU_READ_ACCESS -> branch_direction_prediction */
<0x10030 0x00000000 0x0000001c>,
/* BPU_READ_MISS -> branch_direction_misprediction */
<0x10031 0x00000000 0x0000001b>;
riscv,event-to-mhpmcounters =
/* For Cycle sampling */
/* <0x00001 0x00001 0xfffffff8>, */
/* For Instruction sampling */
/* <0x00002 0x00002 0xfffffff8>, */
<0x00003 0x00003 0xfffffff8>,
<0x00004 0x00004 0xfffffff8>,
<0x00005 0x00005 0xfffffff8>,
Expand Down Expand Up @@ -301,7 +330,85 @@ pmu {
<0x00000000 0x00000029 0xffffffff 0xffffffff 0xfffffff8>,
<0x00000000 0x0000002a 0xffffffff 0xffffffff 0xfffffff8>;
};
....

The Sscofpmf specification does not support Cycle or Instruction sampling (i.e. `perf record`). In the future, custom event mapping can be implemented to enable the following configuration.
....
riscv,event-to-mhpmevent =
/* For Cycle sampling */
/* <0x00001 0x00000000 0x000000??>, */
/* For Instruction sampling */
/* <0x00002 0x00000000 0x000000??>, */
riscv,event-to-mhpmcounters =
/* For Cycle sampling */
/* <0x00001 0x00001 0xfffffff8>, */
/* For Instruction sampling */
/* <0x00002 0x00002 0xfffffff8>, */
....

For example, using `perf stat` & `perf record`:
....
# perf stat -ddd ls
Performance counter stats for 'ls':
82.18 msec task-clock # 0.793 CPUs utilized
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
62 page-faults # 754.476 /sec
3955327 cycles # 0.048 GHz (10.68%)
2134683 instructions # 0.54 insn per cycle (20.41%)
206750 branches # 2.516 M/sec (30.12%)
25300 branch-misses # 12.24% of all branches (39.86%)
414412 L1-dcache-loads # 5.043 M/sec (46.44%)
13633 L1-dcache-load-misses # 3.29% of all L1-dcache accesses (51.32%)
0 LLC-loads # 0.000 /sec (56.19%)
0 LLC-load-misses (61.05%)
2276497 L1-icache-loads # 27.703 M/sec (65.92%)
39158 L1-icache-load-misses # 1.72% of all L1-icache accesses (70.78%)
<not counted> dTLB-loads (0.00%)
0 dTLB-load-misses (4.85%)
<not counted> iTLB-loads (0.00%)
4267 iTLB-load-misses (4.85%)
<not counted> L1-dcache-prefetches (0.00%)
<not counted> L1-dcache-prefetch-misses (0.00%)
0.103628040 seconds time elapsed
0.008110000 seconds user
0.105442000 seconds sys
....

....
# echo 1000 > /proc/sys/kernel/perf_event_max_sample_rate
# perf record -g ls
perf.data
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.006 MB perf.data (9 samples) ]
....

= Appendix B - How to compile perf

We can use buildroot to compile rootfs with perf tool.
....
# git clone https://github.com/buildroot/buildroot.git
# cd buildroot/
# make qemu_riscv64_virt_defconfig
# make menuconfig
....

Enable the following PACKAGE config in menuconfig.
....
BR2_PACKAGE_LINUX_TOOLS=y
BR2_PACKAGE_LINUX_TOOLS_PERF=y
BR2_PACKAGE_ELFUTILS=y
....

= Appendix C - Additional DTS

Additional DTS examples(serial, bootargs with initrd):
....
serial@1900d000 {
compatible = "snps,dw-apb-uart";
reg = <0x0 0x1900d000 0x0 0x400>;
Expand All @@ -322,15 +429,3 @@ chosen {
....

The 'serial' needs to be configured based on the actual configuration of 'reg', 'interrupts', 'clock-frequency', while the 'chosen' needs to be configured based on the actual configuration of 'linux,initrd-start', 'linux,initrd-end'.

= Appendix B

How to use buildroot to compile rootfs with perf tool:

Enable the following PACKAGE config in defconfig.

....
BR2_PACKAGE_LINUX_TOOLS=y
BR2_PACKAGE_LINUX_TOOLS_PERF=y
BR2_PACKAGE_ELFUTILS=y
....

0 comments on commit d798dbe

Please sign in to comment.