Skip costly tests #31945

sandersn · 2019-06-17T20:42:13Z

This PR speeds up runtests-parallel by selectively skipping tests that take a long time to run, but are rarely changed. That means that skipping them is unlikely to cause a problem, but have a large benefit. However, it does mean that rarely — the rate is configurable and defaults to 5% — you will see a failure on CI that you did not see when running gulp runtests-parallel locally.

See the Cost-based sorting section for details.

Jest

I considered a number of other ways to make tests run faster as well, mostly based on the idea that tests that don't touch the code you edit should not need to be run. The best implementation of this is coverage-based testing, but when I tried Jest, its Istanbul-based coverage ran out of memory. I filed a bug and it looks like v8-based coverage is needed, which will take quite a bit of work. We might need to be the ones to do it; in the meantime I want an easy speedup.

I looked at a few recent PRs and guessed that coverage-based testing would cut off 50-80% of our current test time.

Which tests to skip

I considered lots of ways to skip tests. I measured the quality of each method on four criteria:

chance of failure
percent of current test runtime saved
ease of implementation
determinism

A given PR fails if it edits a test that was skipped.

Fourslash

My simplest idea was to just skip fourslash tests when only files in src/compiler are edited, since fourslash takes about 40% of test time. Unfortunately, that fails 7.5% of PRs, and would be useless for non-compiler PRs -- the failure rate would be above 50% for services code. Skipping other test suites besides compiler/conformance wouldn't help because none take that much time.

Cost-based sorting

My next idea was to sort tests from slow to fast, and just skip the slowest ones. But a slow test that catches lots of failures is still valuable. So I decided on Math.log(time% / edit%) as the measure of cost. Every test's runtime is scaled by the number of edits to the test, so slow, useful tests cost less than slow, useless ones.

This saves 38% of test runtime for a 5% chance of failure, or 29% time saving for 2.5% chance of failure. It's also deterministic -- the same tests are skipped for everybody every time -- and configurable -- you decide what failure rate you can tolerate and the tests will speed up by a commensurate amount.

I thought of some variations that I did not implement for various reasons:

Cost-based sampling

Instead of dropping the most costly tests, you could drop tests randomly based on their cost. But this would result in the most costly tests usually being dropped, and is not deterministic, which would be annoying over many local runs.

Per-user training of cost

It is possible to count only tests that you yourself have edited. If you personally never break fourslash tests (for example), there's no value in running them.

However, intelligently generating the test cost is a little more complex and little less deterministic, and only helps performance a little bit — 2% or so. Instead I included the data as part of the PR, along with the script used to update it.

1. Add a script to generate a sorted list of most costly tests. A tests' cost is roughly `runtime% / number of edits`. A slow test that's only been updated once is much less valuable than a slow test that has been updated 20 times: the latter test is catching more changes in the type system. 2. Check in the results of running this script. I want to make the skipping behaviour deterministic and the same for everybody, even though you may get slightly better performance by examining only *your* test changes. 3. Add code to skip tests until it reaches a 5% chance of missing an edit. Right now this provides a 38% speedup. Still not done: 4. Make this value configurable. 5. Make the CI configuration specify a 0% chance of missing an edit.

Currently, the default is 5%. 0 gives you 0% time savings 2.5 gives you 29% 5 gives you 38% 10 gives you 50% 20 gives you 65%

sandersn · 2019-06-18T16:02:13Z

@weswigham suggested making a // @lowpriority: true comment, which the new script adds (or can be added manually). That gives a little more control relative to the json file, at the price of having a fixed percentage of failure, and another directive that needs to be learned.

sandersn · 2019-06-18T21:19:24Z

After some discussion, I'm going to

Use local perf data from parallel-perf if available and skip nothing if not available.
Move the generated json file to the test folder.

Also, it should be possible to speed up the generation script a lot by switching to nodegit since it reads the git reflog directly. If we want to generate the json file locally and remove it from the repo, it would be a good idea to switch to nodegit.

sandersn · 2019-06-19T15:37:09Z

I tried using local data per-run and it is not stable enough since it changes per run. I'm not confident that bucketisation would make it stable, and that makes the code more fancy, not less. We could generate a combined perf/edit-count file on first run only, but then why not just check in that file as in the current commit.

I timed git log release-2.3...master --stat --pretty=online >foo.txt and it takes 72 seconds on my machine. The current script is less than 3x slower -- it takes 196 seconds. I tested the overhead of process.stdout.readline and it's less than 1% of the total (on the default Gnome terminal at least).

In addition, installing nodegit is very slow on Linux if you have to build from source, so it's not a good choice even as a dev dependency. If we want the 3x speedup, we should just parse git's output directly.

GIven the number of alternatives I've tried that haven't worked out, I'm going to leave the code of this PR as-is, and get Daniel to run the script on his machine so we can get performance numbers based on laptop hardware instead of desktop hardware.

Also include parameter name in test output so that people will know what to do to change the percent chance of failure.

sandersn added 4 commits June 14, 2019 13:35

Allow passing skipPercent

3ef953a

Currently, the default is 5%. 0 gives you 0% time savings 2.5 gives you 29% 5 gives you 38% 10 gives you 50% 20 gives you 65%

Fix lint

b4b5bf0

Run all tests on CI

6f28283

sandersn requested review from weswigham and RyanCavanaugh June 17, 2019 20:42

Improve help message for skipPercent

41b117e

sandersn and others added 3 commits June 19, 2019 08:40

Move .test-cost.json

fce7f9f

Also include parameter name in test output so that people will know what to do to change the percent chance of failure.

Add skip-percent alias for skipPercent

a56b65b

Updated test-cost file for laptops.

56370e8

sandersn merged commit 2619522 into master Jun 25, 2019

sandersn deleted the skip-costly-tests branch June 25, 2019 22:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip costly tests #31945

Skip costly tests #31945

sandersn commented Jun 17, 2019

sandersn commented Jun 18, 2019

sandersn commented Jun 18, 2019

sandersn commented Jun 19, 2019

Skip costly tests #31945

Skip costly tests #31945

Conversation

sandersn commented Jun 17, 2019

Jest

Which tests to skip

Fourslash

Cost-based sorting

Cost-based sampling

Per-user training of cost

sandersn commented Jun 18, 2019

sandersn commented Jun 18, 2019

sandersn commented Jun 19, 2019