Skip to content

TestAccuracy

Thomas Roehl edited this page Jun 7, 2021 · 12 revisions

Test the accuracy of some derived metrics

Introduction

Data from hardware performance counters seem to offer a complete, valid and reliable view of the operations done at hardware level but is the data really complete, valid and reliable? The LIKWID team uses hardware counters for a quite some time and we have seen events over- or undercounting as well as many accurate ones.

Benchmark applications

In order to compare the measured data with calculated ones, an application is needed that has the following features:

  • Parseable output of the interested metric
  • Known instruction stream (to specify valid scaling factor and to predict results)
  • Easy to instrument using LIKWID's Marker API

An application that offers all the above points is likwid-bench, because for its assembly benchmarks we can calculate the performed floating-point operations and the consumed data exactly. Moreover, likwid-bench can be easily instrumented with the Likwid Marker API. Nevertheless, likwid-bench currently offers only streaming benchmarks, hence not all interested metrics can be covered like cache access ratios or energy consumption.

Accuracy test tool

The accuracy tool included into the LIKWID suite is written in Python and compares the calculated metric results of likwid-bench with the measured and derived ones of likwid-perfctr. For some tests, likwid-bench does not calculate and print the appropriate metrics, like 'Instructions per branch', but they are commonly constant, hence we can define the result in the test input files.

The accuracy tool can be found in the LIKWID sources in the folder test/accuracy and all following paths are relative to this one. The test runs are defined by the files in the TESTS folder. An example definition looks like this:

REGEX_BENCH MByte\/s:\s+([0-9]+)
REGEX_PERF \|\s+L2 bandwidth \[MBytes\/s\]\s+\|\s+([0-9\.e\+\-]+)

TEST load
RUNS 5
WA_FACTOR 1.0
VARIANT 12kB 20000
VARIANT 1MB 10000
VARIANT  4MB 7500
VARIANT  1GB 50

TEST xxx
[...]

The REGEX_BENCH is used to parse the data from likwid-bench and REGEX_PERF for the output of likwid-perfctr. After an empty line, the test blocks can be listed. The string after TEST defines the benchmark kernel used for likwid-bench. How often each data size should be tested can be defined at RUNS. The WA_FACTOR is required to scale the output of likwid-bench in order to correct the results to take write-allocate traffic into account. Finally, there are multiple lines with VARIANT size iterations. It is recommended to use selected sizes to see the influence of the CPU caches. The iteration defintion is not needed anymore, because starting with version 4.0.0 of LIKWID, likwid-bench determines a suitable iteration count itself to output reliable results.

Which test groups should be performed can be defined in the file SETS.txt. Each line in the file specifies one test file without the suffix .txt, or it can be set on command line using the -s/--sets ≤comma-separated list≥ option.

The tool has some command line options to display the comparison:

Option Comment
--grace Write an input file for Xmgrace (PNG)
--gnuplot Write an imput file for gnuplot (JPG)
--pgf Write an input file for PGFPlots (PDF)
--script Write a script to results directory creating all images
--scriptname Specify the filename for the script file

The results of an accuracy run are stored at RESULTS/≤hostname≥. The output of all runs are stored in .raw files. The input files for plotting are named .dat, where the plain files are the results of likwid-bench, the marker files for likwid-perfctr and the correct files are the scaled results in the plain files using the WA_FACTOR.

Depending on the command line options, there are also .plot files for gnuplot, .agr files for Xmgrace and .tex for PGFPlots. In order to allow all plotting tools simultaneously, each tool uses another output format, noted in the above table. Finally, the script file to create all images is there. The default filename for the script is create_plots.sh. Each plotting backend provides more or less details of the tests.

Running accuracy tests

At first, likwid-bench must be compiled and copied to the local folder. This can be done easily by calling

make

in the base folder (test/accuracy) of the accuracy test tool. It compiles the likwid-bench with and without instrumentation and copies the executables to the current folder. The accuracy tool uses the likwid-perfctr executable in the current source tree, not the maybe locally installed one, hence the path to the access daemon must be set in config.mk before running make in the accuracy tool folder. You can use likwid-perfctr from your system by changing the variable perfctr in the default settings of the accuracy script likwid-accuracy.py

After setting up the executables, start the test runs with gnuplot backend.

./likwid-accuracy --gnuplot --script

It prints the current group and test name. Each iteration is shown with a *. In the end, go to the results folder and create the plots.

cd RESULTS/$(hostname -s)
./create_plots.sh

Tested microarchitectures

Since we are working in a computing center, we have a wide range of microarchitectures inhouse. We tested the accuracy of most architectures. Here is a list of all tested architectures with a link to their accuracy results.

Problems and ideas

In the moment, the accuracy tool is fixed to single-threaded likwid-bench, it would be nice to allow different benchmark applications and to use multiple threads. Moreover, other hardware performance counter tools could be integrated like PAPI or perf_event to see whether they do a more accurate job. There are already some parts that are extended to use PAPI but there is no PAPI integration in likwid-bench.

Clone this wiki locally