[riscv-arch-test] measure execution time of each test #119

umarcor · 2021-07-11T19:47:58Z

This PR adds time -v to the architecture test simulation calls, in order to measure the execution time and resource usage.

Most of the tests need 20-60s, which is reasonable. However, I/jal-01 needs 17 minutes! @stnolting, is that expected? Might that be caused by some bug?

stnolting · 2021-07-12T10:44:54Z

This PR adds time -v to the architecture test simulation calls, in order to measure the execution time and resource usage.

👍

Most of the tests need 20-60s, which is reasonable. However, I/jal-01 needs 17 minutes! @stnolting, is that expected? Might that be caused by some bug?

This is a complicated one 😄

In general, there are 5 supported "test libraries" right now: I, C, M, privilege and Zifencei (🚧). Each library provides several tests. For example, there is the jal-01 test that is part of the I test library.

The real machine run time for executing a complete test library is something like this: number_of_tests * sim_runtime * real_time_factor, where

number_of_tests is the number of tests in one test library. For the I lib there are 38 individual tests.
sim_runtime is the time after that GHDL terminates the simulation. For the I tests this is 850us, which is hardcoded in run_riscv_arch_test.sh
real_time_factor is a factor defined by the performance of the host machine.

The individual tests vary in complexity and thus in the resulting execution time. The current setup uses the worst-case execution time for all tests in one library. No question, this is not efficient at all. So if you say a test takes 17 minutes then I think this is the execution of all tests from one library. Running only I/jal-01 takes ~2 minutes on my laptop.

Also I think that real_time_factor might not be constant for all tests. I depends on the complexity of the simulated logic. So more CPU extensions might increase that factor. And maybe I/jal-01 is very special here because it requires MBs (other tests only use some kBs) of initialized IMEM that also might need additional processing power (further increasing real_time_factor).

I would love to have a better approach here. It would be nice if the program being executed could actually terminate the simulation by itself. I was thinking about some memory-mapped component in the testbench that terminates the simulation by writing a specific pattern. Maybe we could use use std.env.finish; and finish; but I don't know if GHDL supports that. Another option would be assert ... severity error to stop the simulation but then we need to make sure the Shell script does not interpret this as simulation failure.

umarcor · 2021-07-12T12:00:18Z

So if you say a test takes 17 minutes then I think this is the execution of all tests from one library. Running only I/jal-01 takes ~2 minutes on my laptop.

No. I really mean that jal-01 takes 17 minutes. That's why I think it is a problem. See https://github.com/umarcor/neorv32/runs/3041314676?check_suite_focus=true#step:6:449. There are ~85 tests, and all of them take 52 min. So, each test needs 25 seconds on average, except for I/jal-01.

I found that because time is a serious problem when a container is used. See https://github.com/umarcor/neorv32/actions/runs/1020590959. Testing C, M and privilege does work as expected (same time inside or outside a container). However, running I inside a container is awfully slow. So slow that the CI times out (6h) before finishing I/jal-01. Note that the regular simulation inside containers is correct, either with VUnit or without it: https://github.com/umarcor/neorv32/actions/runs/1022647964.

So more CPU extensions might increase that factor. And maybe I/jal-01 is very special here because it requires MBs (other tests only use some kBs) of initialized IMEM that also might need additional processing power (further increasing real_time_factor).

The point is that the hardware used for running I/jal-01 is exactly the same as the other tests in I, isn't it? So, the size of IMEM is not different. It might need additional processing power if the software it is simulating does take longer indeed, but the hardware itself is not aware of the differences. That is, all the extensions/components which are synthesised (elaborated) are actually used (simulated), even if they are useless for a particular software test. As a result, the additional hardware requirements of I/jal-01 might be unnecessarily delaying other tests from I, but not itself!

Anyway, I propose we merge this, so I can rebase the containers branch on top of it. In the end, this PR is harmless as it is only meant for providing more info to us.

It would be nice if the program being executed could actually terminate the simulation by itself. I was thinking about some memory-mapped component in the testbench that terminates the simulation by writing a specific pattern.

I think this is something to be evaluated after we are done with the current set of reorganisation PRs. I have a branch on top of #117 for creating a common.mk to be used by the architecture test makefiles: umarcor/neorv32@sw-move...umarcor:sw-parallel. That will make it easier to enhancement the RUN_TARGET, since it will be done in a single place.

Maybe we could use use std.env.finish; and finish; but I don't know if GHDL supports that.

It does. That is used by VUnit internally.

Another option would be assert ... severity error to stop the simulation but then we need to make sure the Shell script does not interpret this as simulation failure.

Before VHDL 2008, there was no standard procedure for terminating the simulation. Therefore, an assertion or report of severity error is the de facto standard procedure for terminating VHDL 1993 testbenches. All simulators are (should be) aware of that, and can handle it.

In fact, all of the current tests are returning exit code 1 (as this PR shows, because time shows it explicitly). That is ok. The result of the test is not evaluated with that, but after parsing the log output.

stnolting · 2021-07-12T12:40:03Z

You are absolutely right, the hardware is always the same. Why is there such a great variety in the run time?!? Some take 24s, others take 52s and jal takes ages. 😕 When running the tests on my computer all test cases seem to have identical memory requirements. I don't get it... Maybe one test program results in more switching activity than another (so GHDL needs to do more calculations)?? Could this be a reason?!

Maybe we should disable the i-cache for the simulations here. At least this would highly reduce the switching activity - especially for the jal test, which obviously does a lot of jumps. 🤔

Anyway, I propose we merge this, so I can rebase the containers branch on top of it. In the end, this PR is harmless as it is only meant for providing more info to us.

I will check the thing with the i-cache and then we can merge this.

In fact, all of the current tests are returning exit code 1 (as this PR shows, because time shows it explicitly). That is ok. The result of the test is not evaluated with that, but after parsing the log output.

Ok, so we could implement a mechanism for the the CPU to terminate the simulation. Let's discuss/implement this in a follow-up PR/issue.

stnolting · 2021-07-12T15:23:29Z

I will check the thing with the i-cache and then we can merge this.

Disabling the i-cache makes everything slower. I mean, this is obvious somehow... 😄 So it is not an issue with the "amount of switching activity GHDL has to simulate".... Anyway, we can elaborate on that in a later issue.

umarcor force-pushed the test-time branch from f3ba3af to b2b695c Compare July 12, 2021 12:11

[riscv-arch-test] measure execution time of each test

773686d

umarcor force-pushed the test-time branch from b2b695c to 773686d Compare July 12, 2021 14:45

umarcor mentioned this pull request Jul 12, 2021

Split ISA test suites in multiple jobs and rework makefiles #124

Merged

stnolting merged commit 207122d into stnolting:master Jul 12, 2021

umarcor deleted the test-time branch July 12, 2021 15:25

umarcor mentioned this pull request Jul 13, 2021

[ci] add workflow 'Containers' #116

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[riscv-arch-test] measure execution time of each test #119

[riscv-arch-test] measure execution time of each test #119

umarcor commented Jul 11, 2021

stnolting commented Jul 12, 2021

umarcor commented Jul 12, 2021

stnolting commented Jul 12, 2021

stnolting commented Jul 12, 2021

[riscv-arch-test] measure execution time of each test #119

[riscv-arch-test] measure execution time of each test #119

Conversation

umarcor commented Jul 11, 2021

stnolting commented Jul 12, 2021

umarcor commented Jul 12, 2021

stnolting commented Jul 12, 2021

stnolting commented Jul 12, 2021