Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip costly tests #31945

Merged
merged 8 commits into from
Jun 25, 2019
Merged

Skip costly tests #31945

merged 8 commits into from
Jun 25, 2019

Conversation

sandersn
Copy link
Member

This PR speeds up runtests-parallel by selectively skipping tests that take a long time to run, but are rarely changed. That means that skipping them is unlikely to cause a problem, but have a large benefit. However, it does mean that rarely — the rate is configurable and defaults to 5% — you will see a failure on CI that you did not see when running gulp runtests-parallel locally.

See the Cost-based sorting section for details.

Jest

I considered a number of other ways to make tests run faster as well, mostly based on the idea that tests that don't touch the code you edit should not need to be run. The best implementation of this is coverage-based testing, but when I tried Jest, its Istanbul-based coverage ran out of memory. I filed a bug and it looks like v8-based coverage is needed, which will take quite a bit of work. We might need to be the ones to do it; in the meantime I want an easy speedup.

I looked at a few recent PRs and guessed that coverage-based testing would cut off 50-80% of our current test time.

Which tests to skip

I considered lots of ways to skip tests. I measured the quality of each method on four criteria:

  • chance of failure
  • percent of current test runtime saved
  • ease of implementation
  • determinism

A given PR fails if it edits a test that was skipped.

Fourslash

My simplest idea was to just skip fourslash tests when only files in src/compiler are edited, since fourslash takes about 40% of test time. Unfortunately, that fails 7.5% of PRs, and would be useless for non-compiler PRs -- the failure rate would be above 50% for services code. Skipping other test suites besides compiler/conformance wouldn't help because none take that much time.

Cost-based sorting

My next idea was to sort tests from slow to fast, and just skip the slowest ones. But a slow test that catches lots of failures is still valuable. So I decided on Math.log(time% / edit%) as the measure of cost. Every test's runtime is scaled by the number of edits to the test, so slow, useful tests cost less than slow, useless ones.

This saves 38% of test runtime for a 5% chance of failure, or 29% time saving for 2.5% chance of failure. It's also deterministic -- the same tests are skipped for everybody every time -- and configurable -- you decide what failure rate you can tolerate and the tests will speed up by a commensurate amount.

I thought of some variations that I did not implement for various reasons:

Cost-based sampling

Instead of dropping the most costly tests, you could drop tests randomly based on their cost. But this would result in the most costly tests usually being dropped, and is not deterministic, which would be annoying over many local runs.

Per-user training of cost

It is possible to count only tests that you yourself have edited. If you personally never break fourslash tests (for example), there's no value in running them.

However, intelligently generating the test cost is a little more complex and little less deterministic, and only helps performance a little bit — 2% or so. Instead I included the data as part of the PR, along with the script used to update it.

1. Add a script to generate a sorted list of most costly tests. A tests'
cost is roughly `runtime% / number of edits`. A slow test that's only
been updated once is much less valuable than a slow test that has
been updated 20 times: the latter test is catching more changes in the
type system.

2. Check in the results of running this script. I want to make the
skipping behaviour deterministic and the same for everybody, even though
you may get slightly better performance by examining only *your* test
changes.

3. Add code to skip tests until it reaches a 5% chance of missing an
edit. Right now this provides a 38% speedup.

Still not done:
4. Make this value configurable.
5. Make the CI configuration specify a 0% chance of missing an edit.
Currently, the default is 5%.

0 gives you 0% time savings
2.5 gives you 29%
5 gives you 38%
10 gives you 50%
20 gives you 65%
@sandersn
Copy link
Member Author

@weswigham suggested making a // @lowpriority: true comment, which the new script adds (or can be added manually). That gives a little more control relative to the json file, at the price of having a fixed percentage of failure, and another directive that needs to be learned.

@sandersn
Copy link
Member Author

After some discussion, I'm going to

  1. Use local perf data from parallel-perf if available and skip nothing if not available.
  2. Move the generated json file to the test folder.

Also, it should be possible to speed up the generation script a lot by switching to nodegit since it reads the git reflog directly. If we want to generate the json file locally and remove it from the repo, it would be a good idea to switch to nodegit.

@sandersn
Copy link
Member Author

I tried using local data per-run and it is not stable enough since it changes per run. I'm not confident that bucketisation would make it stable, and that makes the code more fancy, not less. We could generate a combined perf/edit-count file on first run only, but then why not just check in that file as in the current commit.

I timed git log release-2.3...master --stat --pretty=online >foo.txt and it takes 72 seconds on my machine. The current script is less than 3x slower -- it takes 196 seconds. I tested the overhead of process.stdout.readline and it's less than 1% of the total (on the default Gnome terminal at least).

In addition, installing nodegit is very slow on Linux if you have to build from source, so it's not a good choice even as a dev dependency. If we want the 3x speedup, we should just parse git's output directly.

GIven the number of alternatives I've tried that haven't worked out, I'm going to leave the code of this PR as-is, and get Daniel to run the script on his machine so we can get performance numbers based on laptop hardware instead of desktop hardware.

sandersn and others added 3 commits June 19, 2019 08:40
Also include parameter name in test output so that people will know what
to do to change the percent chance of failure.
@sandersn sandersn merged commit 2619522 into master Jun 25, 2019
@sandersn sandersn deleted the skip-costly-tests branch June 25, 2019 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants