-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental compilation cache in Cranelift #4155
Comments
I should add: I've started working on this. I already have something that works for the simplest case, that is, changing a single statement in the source code and recompiling only recompiles the one function that changed. I'm now looking into making it resilient when adding/removing functions internally-defined in the same wasm module (viz. not imported functions yet). |
Can you hash the entirety of Also I think this caching implementation can be kept in a separate crate. There is already a cranelift-codegen feature to enable serde serialization and deserialization of
Cg_clif benefits from rustc's incremental compilation framework, but that one only works at the level of codegen units for codegen which is less fine grained than individual functions. |
Thanks for writing all of this up, @bnjbvr! I am very excited to see how this works out, and early results already show that it should be a huge improvement for users with tight edit-compile-reload loops. Two things I wanted to add:
|
From an internal design perspective I would recommend following what pub fn get_data_raw<T, U, E>(
&self,
state: &T,
// NOTE: These are function pointers instead of closures so that they
// don't accidentally close over something not accounted in the cache.
compute: fn(&T) -> Result<U, E>,
serialize: fn(&T, &U) -> Option<Vec<u8>>,
deserialize: fn(&T, Vec<u8>) -> Option<U>,
) -> Result<U, E>
where
T: Hash,
{
// ...
} Here Ideally the |
(Will use ICC for incremental compilation cache from now on :))
I think that's a fine idea, but I wonder if this wouldn't bias the code towards the incremental cache a bit too much, making it non "zero-cost" for users who disabled the incremental compilation cache (ICC) feature? Say, when it comes to source locations, this means that first we'd transform all source locations into relative (from start) ones, then if the ICC isn't used, re-transform them back later into absolute source locations (for the The strategies you mention here can still be used, though. In fact the relative-from-start source location was what I implemented for that specific problem indeed, and I had a similar idea of a scheme for function refs.
That would be really nice if it could be a separate crate, yet I am unclear if it's doable because currently the ICC uses lots of cranelift-codegen internals. Will keep this in mind, and we can reconsider this later when closer to the "final" design and implementation. |
@alexcrichton @cfallin I am trying to consolidate the two design suggestions, so rephrasing to make sure I've understood clearly:
|
To clarify I'm talking more about methods of caching moreso than using specific code exactly as-is without any modifications. The strategy used by I do think it would be a not-great state to be in to have two caching systems at the end of this, so I think in the long run we don't want two entirely separate systems for caching things that know nothing of each other. In the short-to-mid term though it's fine to develop whatever and refactor things later to share more code. |
We already need transforming between the output format of cranelift and the serialized format of wasmtime. It is just a cheap addition which can be folded into this transformation. Basically just add the addition here: wasmtime/crates/cranelift/src/compiler.rs Line 652 in cb13175
That would make the incremental compilation cache depend on the original version of the function, right? I'm afraid that would turn hard to debug bugs like is normally the case for incremental compilation related bugs into straight up nightmares. |
I guess it's not "just" that :-). In practice every transitive field that contains a I do hear your concerns about running into incremental compilation related bugs, though, which would be mitigated by the mix of static typing sanity + fuzzing; any other idea to make it more robust is appreciated here too. |
To unify things a bit: I think that we could, and probably should, use the same caching mechanism for both the Wasmtime module-level cache and any Cranelift function-level cache. I was mainly concerned above about flexibility (unsure with a synchronous API like wasmtime-cache: how would one use an async external storage provider, e.g. a key-value store across a network?). I agree with @alexcrichton that we should design to make it impossible to get cache keys wrong. To that end if we have a "put together the pieces yourself" option, I think that we can still make it completely safe by storing the expected CLIF hash in the cached value as well, and checking that on return. In essence the interface that I think provides maximum flexibility is something like:
This is fully "dynamically safe" in that the cached data is only used if the cache key matches, and the cache key production is internal to Cranelift and opaque to outside. We can then use the type-safety ideas that Ben mentions (wrapper types on fields) to make that CacheKey correct. This together with the fuzzing to verify correctness of incremental compilation I suspect is enough, and this whole approach grants the flexibility that Ben's case needs. Then tying this back to Wasmtime: |
mach_reloc_to_reloc ignores srcloc and so does mach_trap_to_trap. collect_address_maps seems to be the only that actually reads srcloc. |
Indeed, I even started a patch to remove those as they're really unused in the code base,... and then I recalled that Spidermonkey would use those (if it integrated with the new backend). Anyhow, I don't still think it would make for a good API to request any cranelift embedder to do this kind of transform themselves, when it can just be done in Cranelift directly, so I'm going to stick with my plan (which is more aligned with what Chris suggests too), incorporating the good ideas from yours :) |
mach_reloc_to_reloc and friends are in wasmtime-cranelift so spidermonkey can't use them even if it wanted to. They are the translation between Cranelift's representation and Wasmtime's serializable representation of the compilation results. |
I was talking about the (Sorry all for the noise caused by this sidetracked discussion, and @bjorn3 happy to follow up in Zulip if you'd like to chat more about this.) |
This fixes a bug when the `cold` field would not be serialized, since we're using a custom (de)serializer for `Layout`. This is now properly handled by adding a boolean in the serialized stream. This was caught during the work on bytecodealliance#4155, as this would result in cache mismatches between a function and itself.
…4265) This fixes a bug when the `cold` field would not be serialized, since we're using a custom (de)serializer for `Layout`. This is now properly handled by adding a boolean in the serialized stream. This was caught during the work on #4155, as this would result in cache mismatches between a function and itself.
This is the implementation of #4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime. After the suggestion of Chris, `Function` has been split into mostly two parts: - on the one hand, `FunctionStencil` contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (`CompiledCodeBase<Stencil>`) will be the same. This makes caching trivial, as the only thing to cache is the `FunctionStencil`. - on the other hand, `FunctionParameters` contain the... function parameters that are required to finalize the result of compilation into a `CompiledCode` (aka `CompiledCodeBase<Final>`) with proper final relocations etc., by applying fixups and so on. Most changes are here to accomodate those requirements, in particular that `FunctionStencil` should be `Hash`able to be used as a key in the cache: - most source locations are now relative to a base source location in the function, and as such they're encoded as `RelSourceLoc` in the `FunctionStencil`. This required changes so that there's no need to explicitly mark a `SourceLoc` as the base source location, it's automatically detected instead the first time a non-default `SourceLoc` is set. - user-defined external names in the `FunctionStencil` (aka before this patch `ExternalName::User { namespace, index }`) are now references into an external table of `UserExternalNameRef -> UserExternalName`, present in the `FunctionParameters`, and must be explicitly declared using `Function::declare_imported_user_function`. - some refactorings have been made for function names: - `ExternalName` was used as the type for a `Function`'s name; while it thus allowed `ExternalName::Libcall` in this place, this would have been quite confusing to use it there. Instead, a new enum `UserFuncName` is introduced for this name, that's either a user-defined function name (the above `UserExternalName`) or a test case name. - The future of `ExternalName` is likely to become a full reference into the `FunctionParameters`'s mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with. The cache computes a sha256 hash of the `FunctionStencil`, and uses this as the cache key. No equality check (using `PartialEq`) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions. A basic fuzz target has been introduced that tries to do the bare minimum: - check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function. - check that a trivial modification in the external mapping of `UserExternalNameRef -> UserExternalName` hits the cache, and that other modifications don't hit the cache. - This last check is less efficient and less likely to happen, so probably should be rethought a bit. Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip. Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many `FunctionStencil`s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement. Fixes #4155.
Feature
Allow Wasmtime/Cranelift to store compiled artifacts of single functions in a global cache, and when trying to compile a function with the same IR, reload the compiled artifact from the cache instead of recompiling it from scratch.
Benefit
Implementation
serde
andbincode
) and some users might not need the compilation cache, I plan to implement this behind a Cargo feature (and behind a dynamic runtime config in Cranelift too, to enable/disable without having to recompile it all).MachCompileResult
as the output (cache value)To make this really efficient, the caching system should be resilient to non-meaningful changes in the function's input. For instance, it should be independent of the source locations, but also to adding/removing functions (which would shift function references indices, and thus cause mostly only cache misses). This may involve relocations or other fixups to make this work, just after reloading from the cache.
I also want to make sure that we don't accidentally miss a new field in the
ir::Function
being meaningful in the cache key, so here I think of using a mix of strategies:MachCompileResult
and its fields, that require explicit conversion from/to the non-cacheable types. So when we developers add new fields toMachCompileResult
we think about their being needed in the key or not.(more implementations strategies / design choices below)
The caching mechanism should also check the compilation configuration (for which target are we compiling, with which cranelift compile flags).
Provide a way to configure how to store/retrieve the cached bytes (basically a key/value store for which keys and values are
Vec<u8>
or similar). That allows user to implement their cache policy on top of that, and use different backing storage (in memory, on disk), etc.Alternatives
The text was updated successfully, but these errors were encountered: