-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variation in counter values #773
Comments
Hi @mahdi259 interesting problem! |
Hi @NikLeberg, |
I checked There is one asm code that calls start_time. |
Hey @mahdi259 Which counters are you using to benchmark your code? The standard CPU counters Could show the code for your |
Hi
By the way, does |
Ah I see. These functions use the CPU-internal cycle counter Do you need the cycle counter for anything else than benchmarking? E.g. for some delays? I would recommend to use a HPM here so there is no risk that this specific counter might be overridden by some other software part. This is what I would do: // enable ALL counters (including HPMs)
neorv32_cpu_csr_write(CSR_MCOUNTINHIBIT, 0);
// configure HPM3 to count clock ticks
neorv32_cpu_csr_write(CSR_MHPMEVENT3, 1 << HPMCNT_EVENT_CY);
// clear HPM3 (low word only), could be put into a 'start_time' function
neorv32_cpu_csr_write(CSR_MHPMCOUNTER3, 0):
...your application code you want to profile...
// read current HPM3 value (low word only), could be put into a `get_time` function
uint32_t elapsed_cycles = neorv32_cpu_csr_read(CSR_MHPMCOUNTER3); If your application codes runs for a "long" time you might also need to use the high-word of HPM3. |
Unfortunately the problem persists. Just try the following code snippet by commenting/uncommenting the specified printf:
|
I've tested both version. The one without the extra printf:
And the one with the extra printf:
In both cases the HPM reading is identical. I am using a simple You could check the caches with this piece of code: if (NEORV32_SYSINFO->SOC & (1 << SYSINFO_SOC_ICACHE)) { neorv32_uart0_printf("ICACHE enabled\n"); }
else { neorv32_uart0_printf("ICACHE disabled\n"); }
if (NEORV32_SYSINFO->SOC & (1 << SYSINFO_SOC_DCACHE)) { neorv32_uart0_printf("DCACHE enabled\n"); }
else { neorv32_uart0_printf("DCACHE disabled\n"); } |
You are right. Seems that the variation occurs when compiling code with MARCH=rv32imc. MARCH=rv32imc:
MARCH=rv32i:
|
I have re-run my simulations also using a The generated assembly code is identical - at least the loop that is used for profiling. The general outcome of the code is also correct. So there is no "actual" bug in the core (phew 😅). However, this is a really interesting scenario. I've looked through some waveforms and finally found the reason for this strange behavior: The CPU's font-end keeps fetching new instructions independently of the actual instruction execution. As soon as a branch instruction is executed (like at the end of your loop) the front-end is reset to start fetching instruction from the branch destination. The code version running for 517 the instruction fetch was quite fast and the entire instruction prefetch buffer is full. The front-ent is waiting do for some free space in that buffer before it can continue fetching. At this point the end-of-loop branch kicks in. Unfortunately, the front-end is not reset (yet) because it first waits for free space in the prefetch buffer. Long story short, this is the problem: neorv32/rtl/core/neorv32_cpu_control.vhd Lines 379 to 385 in 5281589
In this state the front-end reset should also be evaluated to allow a faster restart. So it should be something like this: when IF_REQUEST => -- request next 32-bit-aligned instruction word
-- ------------------------------------------------------------
if (ipb.free = "11") then -- wait for free IPB space
fetch_engine.state <= IF_PENDING;
elsif (fetch_engine.restart = '1') or (fetch_engine.reset = '1') then -- restart request due to branch
fetch_engine.state <= IF_RESTART;
elsif (execute_engine.state = SLEEP) then -- halt request (sleep)?
fetch_engine.state <= IF_PARKED;
end if; I will do some more tests and if everything works fine I'll open a PR to update the front-end logic. Thanks for finding this |
Thanks |
The variations issue should be fixed now - at least when no caches are implemented 😉 |
Yes. There is no variation in the discussed code now. Thanks for your effort. |
Actually I got some further variations in my benchmark experimenting CFU custom instructions. Finally I removed C extension in compilation and got stable results. Thanks again for your attention. Best regards for you... |
The C extension can cause minor variations as it also supports unaligned 32-bit instructions words (that are 16-bit aligned) requiring an additional bus access to fetch the "second half of the instruction". In large programs this variation should not be noticeable. However, you could try to increase the size of the instruction prefetch buffer: neorv32/rtl/core/neorv32_package.vhd Lines 51 to 52 in c95e039
This might reduce this kind of variations. Furthermore, large blocks of linear code can benefit from this is terms of increased execution speed. |
Thanks for your detailed description. |
Describe the bug
I use neorv32 to determine cycle counts that my custom instruction takes in a loop. My intention is to show the benefits that it apply to a certain application instead of using the conventional method.
The problem is that when I add some neorv32_uart0_printf before my experiment block, the total clock cycles changes. It looks storage because start_time() should work independently after printing.
Expected behavior
The total cycle count of a piece of code be constant, regardless of other codes of the program.
Screenshots
If applicable, add screenshots to help explain your problem.
Hardware:
Extensions: M, C, zicntr, zihpm, zxcfu.
As pictures show, removing one printf increases measured clock cycles.
// neorv32_uart0_printf("----------------\n");
neorv32_uart0_printf("---- First method:\n");
// ------------------------
res = 0;
iterations = 40;
start_time();
for (i=0; i<iterations; i++) {
res += neorv32_cfu_r3_instr(0b0000000, 0b000, rs1, rs2);
}
stop_time();
t0 = get_time();
neorv32_uart0_printf("Iterations: %d, Cycles: %d\n", iterations, t0);
// ------------------------
The text was updated successfully, but these errors were encountered: