Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

turn on tlb prefetch and evict #26

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 99 additions & 14 deletions README.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -206,35 +206,60 @@ https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Document

vmlinux: The Linux kernel ELF file

= Appendix A
= Appendix A - PMU in DTS

Additional DTS examples(pmu, serial, bootargs with initrd):
The configuration of PMU can be referred to link:https://github.com/riscv-software-src/opensbi/blob/master/docs/pmu_support.md[OpenSBI SBI PMU extension]

The following is an example of PMU configuration for the Xuantie C-series CPU, which may need to be modified according to the datasheet during actual use.
....
pmu {
compatible = "riscv,pmu";
riscv,event-to-mhpmevent =
/* PMU_HW_CACHE_REFERENCES -> ll_cache_read_access */
<0x00003 0x00000000 0x00000010>,
/* PMU_HW_CACHE_MISSES -> ll_cache_read_miss */
<0x00004 0x00000000 0x00000011>,
/* PMU_HW_BRANCH_INSTRUCTIONS -> inst_branch */
<0x00005 0x00000000 0x00000007>,
/* PMU_HW_BRANCH_MISSES -> inst_branch_mispredict */
<0x00006 0x00000000 0x00000006>,
/* PMU_HW_STALLED_CYCLES_FRONTEND -> ifu_stalled_cycle */
<0x00008 0x00000000 0x00000027>,
/* PMU_HW_STALLED_CYCLES_BACKEND -> idu_stalled_cycle */
<0x00009 0x00000000 0x00000028>,
/* L1D_READ_ACCESS -> l1_dcache_read_access */
<0x10000 0x00000000 0x0000000c>,
/* L1D_READ_MISS -> l1_dcache_read_miss */
<0x10001 0x00000000 0x0000000d>,
/* L1D_WRITE_ACCESS -> l1_dcache_write_access */
<0x10002 0x00000000 0x0000000e>,
/* L1D_WRITE_MISS -> l1_dcache_write_miss */
<0x10003 0x00000000 0x0000000f>,
/* L1I_READ_ACCESS -> l1_icache_access */
<0x10008 0x00000000 0x00000001>,
/* L1I_READ_MISS -> l1_icache_miss */
<0x10009 0x00000000 0x00000002>,
/* LL_READ_ACCESS -> ll_cache_read_access */
<0x10010 0x00000000 0x00000010>,
/* LL_READ_MISS -> ll_cache_read_miss */
<0x10011 0x00000000 0x00000011>,
/* LL_WRITE_ACCESS -> ll_cache_write_access */
<0x10012 0x00000000 0x00000012>,
/* LL_WRITE_MISS -> ll_cache_write_miss */
<0x10013 0x00000000 0x00000013>,
/* DTLB_READ_MISS -> dtlb_miss */
<0x10019 0x00000000 0x00000004>,
/* ITLB_READ_MISS -> itlb_miss */
<0x10021 0x00000000 0x00000003>,
/* BPU_READ_ACCESS -> branch_direction_prediction */
<0x10030 0x00000000 0x0000001c>,
/* BPU_READ_MISS -> branch_direction_misprediction */
<0x10031 0x00000000 0x0000001b>;
riscv,event-to-mhpmcounters =
/* Cycle */
/* <0x00001 0x00001 0xfffffff9>, */
/* Instruction */
/* <0x00002 0x00002 0xfffffffc>, */
<0x00003 0x00003 0xfffffff8>,
<0x00004 0x00004 0xfffffff8>,
<0x00005 0x00005 0xfffffff8>,
Expand Down Expand Up @@ -301,7 +326,79 @@ pmu {
<0x00000000 0x00000029 0xffffffff 0xffffffff 0xfffffff8>,
<0x00000000 0x0000002a 0xffffffff 0xffffffff 0xfffffff8>;
};
....

If you want to use `perf record` for Cycle or Instruction sampling, you need to add the following two lines in `riscv,event-to-mhpmcounters`.
This will result in inaccurate `perf stat` counting, which is not enable by default.
....
/* Cycle */
<0x00001 0x00001 0xfffffff9>,
/* Instruction */
<0x00002 0x00002 0xfffffffc>,
....

For example, using `perf stat` & `perf record`:
....
# perf stat -ddd ls

Performance counter stats for 'ls':

82.18 msec task-clock # 0.793 CPUs utilized
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
62 page-faults # 754.476 /sec
3955327 cycles # 0.048 GHz (10.68%)
2134683 instructions # 0.54 insn per cycle (20.41%)
206750 branches # 2.516 M/sec (30.12%)
25300 branch-misses # 12.24% of all branches (39.86%)
414412 L1-dcache-loads # 5.043 M/sec (46.44%)
13633 L1-dcache-load-misses # 3.29% of all L1-dcache accesses (51.32%)
0 LLC-loads # 0.000 /sec (56.19%)
0 LLC-load-misses (61.05%)
2276497 L1-icache-loads # 27.703 M/sec (65.92%)
39158 L1-icache-load-misses # 1.72% of all L1-icache accesses (70.78%)
<not counted> dTLB-loads (0.00%)
0 dTLB-load-misses (4.85%)
<not counted> iTLB-loads (0.00%)
4267 iTLB-load-misses (4.85%)
<not counted> L1-dcache-prefetches (0.00%)
<not counted> L1-dcache-prefetch-misses (0.00%)

0.103628040 seconds time elapsed

0.008110000 seconds user
0.105442000 seconds sys
....

....
# echo 1000 > /proc/sys/kernel/perf_event_max_sample_rate
# perf record -g ls
perf.data
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.006 MB perf.data (9 samples) ]
....

= Appendix B - How to compile perf

We can use buildroot to compile rootfs with perf tool.
....
# git clone https://github.com/buildroot/buildroot.git
# cd buildroot/
# make qemu_riscv64_virt_defconfig
# make menuconfig
....

Enable the following PACKAGE config in menuconfig.
....
BR2_PACKAGE_LINUX_TOOLS=y
BR2_PACKAGE_LINUX_TOOLS_PERF=y
BR2_PACKAGE_ELFUTILS=y
....

= Appendix C - Additional DTS

Additional DTS examples(serial, bootargs with initrd):
....
serial@1900d000 {
compatible = "snps,dw-apb-uart";
reg = <0x0 0x1900d000 0x0 0x400>;
Expand All @@ -322,15 +419,3 @@ chosen {
....

The 'serial' needs to be configured based on the actual configuration of 'reg', 'interrupts', 'clock-frequency', while the 'chosen' needs to be configured based on the actual configuration of 'linux,initrd-start', 'linux,initrd-end'.

= Appendix B

How to use buildroot to compile rootfs with perf tool:

Enable the following PACKAGE config in defconfig.

....
BR2_PACKAGE_LINUX_TOOLS=y
BR2_PACKAGE_LINUX_TOOLS_PERF=y
BR2_PACKAGE_ELFUTILS=y
....
9 changes: 8 additions & 1 deletion feature.c
Original file line number Diff line number Diff line change
Expand Up @@ -119,12 +119,19 @@ void setup_features(void)
csr_write(CSR_MXSTATUS, 0x438000);
csr_write(CSR_MHINT, 0x21aa10c);
csr_write(CSR_MHCR, 0x10011ff);
} else if (cpu_ver >= 0x1003 && cpu_ver <= 0xffff) { //1.0.3~
} else if (cpu_ver >= 0x1003 && cpu_ver <= 0x1009) { //1.0.3~1.0.9
csr_write(CSR_MSMPR, 0x1);
csr_write(CSR_MCCR2, 0xa042000a);
csr_write(CSR_MXSTATUS, 0x438000);
csr_write(CSR_MHINT, 0x1aa10c);
csr_write(CSR_MHCR, 0x10011ff);
} else if (cpu_ver >= 0x100a && cpu_ver <= 0xffff) { //1.0.10~
csr_write(CSR_MSMPR, 0x1);
csr_write(CSR_MCCR2, 0xa042000a);
csr_write(CSR_MXSTATUS, 0x438000);
csr_write(CSR_MHINT, 0x21aa10c);
csr_write(CSR_MHCR, 0x10011ff);
csr_write(CSR_MHINT4, 0x10000000);
} else {
while(1);
}
Expand Down