-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
3 additions
and
157 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,158 +1,4 @@ | ||
# Candidate Benchmark Programs | ||
# Benchmarks | ||
|
||
This directory contains the candidate programs for the benchmark suite. They are | ||
candidates, not officially part of the suite yet, because we [intend][rfc] to | ||
record various metrics about the programs and then run a principal component | ||
analysis to find a representative subset of candidates that doesn't contain | ||
effectively duplicate workloads. | ||
|
||
[rfc]: https://github.com/bytecodealliance/rfcs/pull/4 | ||
|
||
## Building | ||
|
||
Build an individual benchmark program via: | ||
|
||
``` | ||
$ ./build.sh path/to/benchmark/dir/ | ||
``` | ||
|
||
Build all benchmark programs by running: | ||
|
||
``` | ||
$ ./build-all.sh | ||
``` | ||
|
||
## Minimal Technical Requirements | ||
|
||
In order for the benchmark runner to successfully execute a Wasm program and | ||
record its execution, it must: | ||
|
||
* Export a `_start` function of type `[] -> []`. | ||
|
||
* Import `bench.start` and `bench.end` functions, both of type `[] -> []`. | ||
|
||
* Call `bench.start` exactly once during the execution of its `_start` | ||
function. This is when the benchmark runner will start recording execution | ||
time and performance counters. | ||
|
||
* Call `bench.end` exactly once during execution of its `_start` function, after | ||
`bench.start` has already been called. This is when the benchmark runner will | ||
stop recording execution time and performance counters. | ||
|
||
* Provide reproducible builds via Docker (see [`build.sh`](./build.sh)). | ||
|
||
* Be located in a `sightglass/benchmarks/$BENCHMARK_NAME` directory. Typically | ||
the benchmark is named `benchmark.wasm`, but benchmarks with multiple files | ||
should use names like `<benchmark name>-<subtest name>.wasm` (e.g., | ||
`libsodium-chacha20.wasm`). | ||
|
||
* Input workloads must be files that live in the same directory as the `.wasm` | ||
benchmark program. The benchmark program is run within the directory where it | ||
lives on the filesystem, with that directory pre-opened in WASI. The workload | ||
must be read via a relative file path. | ||
|
||
If, for example, the benchmark processes JSON input, then its input workload | ||
should live at `sightglass/benchmarks/$BENCHMARK_NAME/input.json`, and it | ||
should open that file as `"./input.json"`. | ||
|
||
* Define the expected `stdout` output in a `./<benchmark name>.stdout.expected` | ||
sibling file located next to the `benchmark.wasm` file (e.g., | ||
`benchmark.stdout.expected`). The runner will assert that the actual | ||
execution's output matches the expectation. | ||
|
||
* Define the expected `stderr` output in a `./<benchmark name>.stderr.expected` | ||
sibling file located next to the `benchmark.wasm` file. The runner will assert | ||
that the actual execution's output matches the expectation. | ||
|
||
Many of the above requirements can be checked by running the `.wasm` file | ||
through the `validate` command: | ||
|
||
``` | ||
$ cargo run -- validate path/to/benchmark.wasm | ||
``` | ||
|
||
## Compatibility Requirements for Native Execution | ||
|
||
Sightglass can also measure the performance of a subset of benchmarks compiled | ||
to native code (i.e., not WebAssembly). To compile these benchmarks without | ||
changing their source code, this involves a delicate interface with the [native | ||
engine] with some additional requirements beyond the [Minimal Technical | ||
Requirements] noted above: | ||
|
||
[native engine]: ../engines/native | ||
[Minimal Technical Requirements]: #minimal-technical-requirements | ||
|
||
* Generate an ELF shared library linked to the [native engine] shared library to | ||
provide definitions for `bench_start` and `bench_end`. | ||
|
||
* Rename the `main` function to `native_entry`. For C- and C++-based source this | ||
can be done with a simple define directive passed to `cc` (e.g., | ||
`-Dmain=native_entry`). | ||
|
||
* Provide reproducible builds via a `Dockerfile.native` file (see | ||
[`build-native.sh`](./build-native.sh)). | ||
|
||
Note that support for native execution is optional: adding a WebAssembly | ||
benchmark does not imply the need to support its native equivalent — CI | ||
will not fail if it is not included. | ||
|
||
## Additional Requirements | ||
|
||
> Note: these requirements are lifted directly from the [the benchmarking | ||
> RFC][rfc]. | ||
In addition to the minimal technical requirements, for a benchmark program to be | ||
useful to Wasmtime and Cranelift developers, it should additionally meet the | ||
following requirements: | ||
|
||
* Candidates should be real, widely used programs, or at least extracted kernels | ||
of such programs. These programs are ideally taken from domains where Wasmtime | ||
and Cranelift are currently used, or domains where they are intended to be a | ||
good fit (e.g. serverless compute, game plugins, client Web applications, | ||
server Web applications, audio plugins, etc.). | ||
|
||
* A candidate program must be deterministic (modulo Wasm nondeterminism like | ||
`memory.grow` failure). | ||
|
||
* A candidate program must have two associated input workloads: one small and | ||
one large. The small workload may be used by developers locally to get quick, | ||
ballpark numbers for whether further investment in an optimization is worth | ||
it, without waiting for the full, thorough benchmark suite to complete. | ||
|
||
* Each workload must have an expected result, so that we can validate executions | ||
and avoid accepting "fast" but incorrect results. | ||
|
||
* Compiling and instantiating the candidate program and then executing its | ||
workload should take *roughly* one to six seconds total. | ||
|
||
> Napkin math: We want the full benchmark to run in a reasonable amount of | ||
> time, say twenty to thirty minutes, and we want somewhere around ten to | ||
> twenty programs altogether in the benchmark suite to balance diversity, | ||
> simplicity, and time spent in execution versus compilation and | ||
> instantiation. Additionally, for good statistical analyses, we need *at | ||
> least* 30 samples (ideally more like 100) from each benchmark program. That | ||
> leaves an average of about one to six seconds for each benchmark program to | ||
> compile, instantiate, and execute the workload. | ||
* Inputs should be given through I/O and results reported through I/O. This | ||
ensures that the compiler cannot optimize the benchmark program away. | ||
|
||
* Candidate programs should only import WASI functions. They should not depend | ||
on any other non-standard imports, hooks, or runtime environment. | ||
|
||
* Candidate programs must be open source under a license that allows | ||
redistributing, modifying and redistributing modified versions. This makes | ||
distributing the benchmark easy, allows us to rebuild Wasm binaries as new | ||
versions are released, and lets us do source-level analysis of benchmark | ||
programs when necessary. | ||
|
||
* Repeated executions of a candidate program must yield independent samples | ||
(ignoring priming Wasmtime's code cache). If the execution times keep taking | ||
longer and longer, or exhibit harmonics, they are not independent and this can | ||
invalidate any statistical analyses of the results we perform. We can easily | ||
check for this property with either [the chi-squared | ||
test](https://en.wikipedia.org/wiki/Chi-squared_test) or [Fisher's exact | ||
test](https://en.wikipedia.org/wiki/Fisher%27s_exact_test). | ||
|
||
* The corpus of candidates should include programs that use a variety of | ||
languages, compilers, and toolchains. | ||
The set of benchmarks here have been copied from [Sightglass](https:/ | ||
/github.com/bytecodealliance/sightglass/benchmarks). In general, the benchmarks here and will mostly be consistent with the set of benchmarks in that repository. |