Implement an incremental compilation cache for Cranelift #4551

bnjbvr · 2022-07-28T17:38:01Z

This is the implementation of #4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime.

After the suggestion of Chris, Function has been split into mostly two parts:

on the one hand, FunctionStencil contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (CompiledCodeBase<Stencil>) will be the same. This makes caching trivial, as the only thing to cache is the FunctionStencil.
on the other hand, FunctionParameters contain the... function parameters that are required to finalize the result of compilation into a CompiledCode (aka CompiledCodeBase<Final>) with proper final relocations etc., by applying fixups and so on.

Most changes are here to accomodate those requirements, in particular that FunctionStencil should be Hashable to be used as a key in the cache:

most source locations are now relative to a base source location in the function, and as such they're encoded as RelSourceLoc in the FunctionStencil. This required changes so that there's no need to explicitly mark a SourceLoc as the base source location, it's automatically detected instead the first time a non-default SourceLoc is set.
user-defined external names in the FunctionStencil (aka before this patch ExternalName::User { namespace, index }) are now references into an external table of UserExternalNameRef -> UserExternalName, present in the FunctionParameters, and must be explicitly declared using Function::declare_imported_user_function.
some refactorings have been made for function names:
- ExternalName was used as the type for a Function's name; while it thus allowed ExternalName::Libcall in this place, this would have been quite confusing to use it there. Instead, a new enum UserFuncName is introduced for this name, that's either a user-defined function name (the above UserExternalName) or a test case name.
- The future of ExternalName is likely to become a full reference into the FunctionParameters's mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with.

The cache computes a sha256 hash of the FunctionStencil, and uses this as the cache key. No equality check (using PartialEq) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions.

A basic fuzz target has been introduced that tries to do the bare minimum:

check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function.
check that a trivial modification in the external mapping of UserExternalNameRef -> UserExternalName hits the cache, and that other modifications don't hit the cache.
- This last check is less efficient and less likely to happen, so probably should be rethought a bit.

Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.

Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many FunctionStencils are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement.

Fixes #4155.

cranelift/codegen/src/incremental_cache.rs

cranelift/codegen/src/ir/dfg.rs

cranelift/fuzzgen/src/function_generator.rs

cranelift/fuzzgen/src/lib.rs

cfallin

Thanks so much for working on this @bnjbvr -- I'm extremely excited for the compile-time improvements this will bring, and there's no question in my mind that we'll merge this!

I took a first look at the approach and I have a few high level thoughts. The main one is: the duplication of types into "cached" variants, and the translation between these and the existing IR types, has me thinking that it might be significantly easier for maintenance, readability, and correctness if we instead separate out the types completely so that we have a core function-IR type that is exactly the cache key, without any munging, and then a separate "specialization" / "instantiation" / "finalization" (unsure of the best verb here) stage to plug in the bits we are resilient to: source locs, value labels, referenced names, etc.

In other words, I think we should take a scalpel to Function and DataFlowGraph and (i) make references to source locs always relative to a base, (ii) make ExternalName references indirect through a table, etc.

Then we can put that source-loc base index, that referenced-name table, and anything else, in a separate struct. And Function is a struct that composes FunctionCore and FunctionParams.

Then the optimizer and backends have access only to FunctionCore, and compile down to a MachCompileResultCore. And we have a final relocation-application bit that turns that into MachCompileResult. (Or something like that anyway; I'm not too picky about the type names and exact divisions, just that we make sure that the core of the compiler does not have any access to the "parameters" that are fixed up later.)

I don't think that's too much of a departure from much of the logic here, it's just rotating it a bit. The advantage is that it will almost be correct by construction; we're using the type system to ensure that we don't have any data-dependence leaks from things not in the cache key to the bit that we cache. And the key idea of "relocations" or "fixups" is still there, just made more explicit in the types.

Sorry I didn't come up with this earlier -- it wasn't apparent exactly what the thing would look like until I saw CachedDataFlowGraph and friends :-)

Happy to talk further about this of course!

cfallin

Err, I didn't mean to check the r+ bit in that last review-comment. Adding a "request changes" review here as per above (but I'm still very excited about this).

bnjbvr · 2022-08-01T15:09:20Z

Thanks all for the first batch of reviews!

@cfallin this makes sense to me. At this point there's not much that needs those relocations, if I'm not mistaken; mostly things that weren't trivially Hashables, or source locations that were absolute, or external names being embedded and copied around. I'm looking into this.

split `Function` into `FunctionParameters` and `FunctionStencil`, so the stencil can be compiled and cached, and compilation parameters can be applied onto the result of compiling a FunctionStencil (CompiledCodeBase<Stencil>) later on to become a consumable CompiledCode.

…tial-eq

bnjbvr · 2022-08-10T16:14:01Z

I've updated the initial text of this PR, and while not ideal/perfect, I claim this is ready for review and could be merged as a first step. Remaining things to do as follow-ups, either myself in my copious free time, or other interested contributors 😍 :

fuzzing could be improved, to generate an instruction that could serve as replacement of another instruction. This would likely also require a position where to put this instruction, and some care ensuring that the instruction is compatible (same number of input/returns + same types for all inputs/returns)
there's a proliferation of Name types that could potentially be unified, maybe?
ExternalName could just become a reference into the FunctionParameters mapping of reference to actual name value, as (it's likely, not proven that) codegen doesn't care about the identity of the function callee. So ExternalName would become a new entity type, and there'd be ExternalNameData in the table. Might not be true for libcalls which identity may matter during codegen, so may require that ExternalName becomes a sum type: either a reference into the FunctionParameter's table, or the Libcall embedded itself.
right now, if a UserExternalNameRef changes (that is, a function call switches the callee identity), then it changes the identity of the FunctionStencil, thus resulting in a cache miss; this could likely be refactored and improved, enabling more cache hits.
optimization work could be done to identify why a recompilation is, in the worst case, only 50% faster than compiling from scratch (it should be much higher in theory).

Unfortunately the previous implementation was more powerful, in the sense that a function call (in a function body) could change a call target, and that would result in a cache hit. With the new FunctionStencil approach this is not the case anymore because that changes the `FunctionStencil` identity. A future refactoring could make it possible to support caching for that specific case.

bnjbvr · 2022-08-11T17:12:31Z

@cfallin This new version incorporates changes to address your review feedback during the live review session (that we organized to help going through the changes, as this is a large patch). This is now ready and passes cargo test --all on my machine, with no performance regression on compile times, so I would claim this is ready :-) Cheers!

cfallin

This all looks great to me! Thanks @bnjbvr for the patience and perseverance on this one. After our live review today I'm confident that this is likely to work well and be maintainable, and have the safety properties we would expect; in particular, the type-level separation of cacheable vs. change-resilient state, and the embedding of version markers in the key and value both.

It looks like there's a CI failure that may be the result of the new Rust version (a new warning appeared?); I'll look into it. I don't think it's caused by this PR.

cfallin · 2022-08-11T18:27:37Z

I just merged in main to this PR to see if it fixes CI.

cranelift/codegen/src/incremental_cache.rs

cranelift/codegen/src/ir/dfg.rs

cfallin · 2022-08-11T19:16:28Z

Hmm, @bnjbvr it looks like some of the changes broke the fuzz targets -- happy to merge once that is fixed up.

cranelift/codegen/src/ir/sourceloc.rs

cranelift/module/src/module.rs

bjorn3 · 2022-08-12T16:22:21Z

cranelift/module/src/module.rs

-    fn declare_func_in_data(&self, func: FuncId, ctx: &mut DataContext) -> ir::FuncRef {
-        (**self).declare_func_in_data(func, ctx)
-    }
-
-    fn declare_data_in_data(&self, data: DataId, ctx: &mut DataContext) -> ir::GlobalValue {
-        (**self).declare_data_in_data(data, ctx)
-    }


These haven't yet been re-added.

Ah yeah, seems I added it back in one implementation, but not in the trait. Is that something that you were using, @bjorn3? I could add it back later, if required

This change means that declare_func_in_data and declare_data_in_data are no longer forwarded to FooModule when you call them on a &mut FooModule, which is incorrect if those functions have been overwritten by FooModule.

I see; it's likely subpar design if the default implementation is incorrect for some implementations by default, but that's a different discussion to have :) Will post a small patch for this chunk.

bnjbvr changed the title ~~Implement an incremental compilation cache for Cranelift and make it available in Wasmtime~~ Implement an incremental compilation cache for Cranelift Jul 28, 2022

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:meta Everything related to the meta-language. labels Jul 28, 2022

bjorn3 reviewed Jul 28, 2022

View reviewed changes

cranelift/codegen/src/incremental_cache.rs Outdated Show resolved Hide resolved

bjorn3 reviewed Jul 28, 2022

View reviewed changes

cranelift/codegen/src/incremental_cache.rs Outdated Show resolved Hide resolved

bjorn3 reviewed Jul 28, 2022

View reviewed changes

cranelift/codegen/src/incremental_cache.rs Outdated Show resolved Hide resolved

bjorn3 reviewed Jul 28, 2022

View reviewed changes

cranelift/codegen/src/ir/dfg.rs Outdated Show resolved Hide resolved

afonso360 reviewed Jul 28, 2022

View reviewed changes

cranelift/fuzzgen/src/function_generator.rs Outdated Show resolved Hide resolved

afonso360 reviewed Jul 28, 2022

View reviewed changes

cranelift/fuzzgen/src/lib.rs Outdated Show resolved Hide resolved

cfallin approved these changes Jul 29, 2022

View reviewed changes

cfallin requested changes Jul 29, 2022

View reviewed changes

bnjbvr force-pushed the wip-05-09_wasm_cache branch from 8753cdf to 5567177 Compare August 1, 2022 13:48

bnjbvr mentioned this pull request Aug 4, 2022

Embark's pull requests EmbarkStudios/rust-ecosystem#20

Open

26 tasks

bnjbvr added 2 commits August 9, 2022 20:54

introduce an incremental cache (first try)

3e95b3f

bnjbvr force-pushed the wip-05-09_wasm_cache branch from 5567177 to 7a3d562 Compare August 9, 2022 19:24

github-actions bot added the cranelift:area:aarch64 Issues related to AArch64 backend. label Aug 9, 2022

bnjbvr added 7 commits August 10, 2022 11:01

Merge branch 'main' into wip-05-09_wasm_cache

dfb619d

fix accidental regression

f27949e

use a plain UserFuncName as the Function's name

7be962e

have FunctionParameters deduplicate similar user external names

d1c76bd

hoist the name back into Function

4b4bd7f

fix fuzz targets

9825d66

use a sha256 without partial-eq check instead of weak hash + full par…

948d976

…tial-eq

bnjbvr marked this pull request as ready for review August 10, 2022 16:14

bnjbvr requested a review from cfallin August 10, 2022 16:14

bnjbvr added 3 commits August 10, 2022 18:53

hopefully fix doc build?

2d4e260

Address feedback from live review with @cfallin (thanks!)

838108d

Merge branch 'main' into wip-05-09_wasm_cache

254a285

cfallin approved these changes Aug 11, 2022

View reviewed changes