Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement an incremental compilation cache for Cranelift #4551

Merged
merged 16 commits into from
Aug 12, 2022

Conversation

bnjbvr
Copy link
Member

@bnjbvr bnjbvr commented Jul 28, 2022

This is the implementation of #4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime.

After the suggestion of Chris, Function has been split into mostly two parts:

  • on the one hand, FunctionStencil contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (CompiledCodeBase<Stencil>) will be the same. This makes caching trivial, as the only thing to cache is the FunctionStencil.
  • on the other hand, FunctionParameters contain the... function parameters that are required to finalize the result of compilation into a CompiledCode (aka CompiledCodeBase<Final>) with proper final relocations etc., by applying fixups and so on.

Most changes are here to accomodate those requirements, in particular that FunctionStencil should be Hashable to be used as a key in the cache:

  • most source locations are now relative to a base source location in the function, and as such they're encoded as RelSourceLoc in the FunctionStencil. This required changes so that there's no need to explicitly mark a SourceLoc as the base source location, it's automatically detected instead the first time a non-default SourceLoc is set.
  • user-defined external names in the FunctionStencil (aka before this patch ExternalName::User { namespace, index }) are now references into an external table of UserExternalNameRef -> UserExternalName, present in the FunctionParameters, and must be explicitly declared using Function::declare_imported_user_function.
  • some refactorings have been made for function names:
    • ExternalName was used as the type for a Function's name; while it thus allowed ExternalName::Libcall in this place, this would have been quite confusing to use it there. Instead, a new enum UserFuncName is introduced for this name, that's either a user-defined function name (the above UserExternalName) or a test case name.
    • The future of ExternalName is likely to become a full reference into the FunctionParameters's mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with.

The cache computes a sha256 hash of the FunctionStencil, and uses this as the cache key. No equality check (using PartialEq) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions.

A basic fuzz target has been introduced that tries to do the bare minimum:

  • check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function.
  • check that a trivial modification in the external mapping of UserExternalNameRef -> UserExternalName hits the cache, and that other modifications don't hit the cache.
    • This last check is less efficient and less likely to happen, so probably should be rethought a bit.

Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.

Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many FunctionStencils are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement.

Fixes #4155.

@bnjbvr bnjbvr changed the title Implement an incremental compilation cache for Cranelift and make it available in Wasmtime Implement an incremental compilation cache for Cranelift Jul 28, 2022
@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:meta Everything related to the meta-language. labels Jul 28, 2022
Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for working on this @bnjbvr -- I'm extremely excited for the compile-time improvements this will bring, and there's no question in my mind that we'll merge this!

I took a first look at the approach and I have a few high level thoughts. The main one is: the duplication of types into "cached" variants, and the translation between these and the existing IR types, has me thinking that it might be significantly easier for maintenance, readability, and correctness if we instead separate out the types completely so that we have a core function-IR type that is exactly the cache key, without any munging, and then a separate "specialization" / "instantiation" / "finalization" (unsure of the best verb here) stage to plug in the bits we are resilient to: source locs, value labels, referenced names, etc.

In other words, I think we should take a scalpel to Function and DataFlowGraph and (i) make references to source locs always relative to a base, (ii) make ExternalName references indirect through a table, etc.

Then we can put that source-loc base index, that referenced-name table, and anything else, in a separate struct. And Function is a struct that composes FunctionCore and FunctionParams.

Then the optimizer and backends have access only to FunctionCore, and compile down to a MachCompileResultCore. And we have a final relocation-application bit that turns that into MachCompileResult. (Or something like that anyway; I'm not too picky about the type names and exact divisions, just that we make sure that the core of the compiler does not have any access to the "parameters" that are fixed up later.)

I don't think that's too much of a departure from much of the logic here, it's just rotating it a bit. The advantage is that it will almost be correct by construction; we're using the type system to ensure that we don't have any data-dependence leaks from things not in the cache key to the bit that we cache. And the key idea of "relocations" or "fixups" is still there, just made more explicit in the types.

Sorry I didn't come up with this earlier -- it wasn't apparent exactly what the thing would look like until I saw CachedDataFlowGraph and friends :-)

Happy to talk further about this of course!

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, I didn't mean to check the r+ bit in that last review-comment. Adding a "request changes" review here as per above (but I'm still very excited about this).

@bnjbvr
Copy link
Member Author

bnjbvr commented Aug 1, 2022

Thanks all for the first batch of reviews!

@cfallin this makes sense to me. At this point there's not much that needs those relocations, if I'm not mistaken; mostly things that weren't trivially Hashables, or source locations that were absolute, or external names being embedded and copied around. I'm looking into this.

split `Function` into `FunctionParameters` and `FunctionStencil`, so the
stencil can be compiled and cached, and compilation parameters can be
applied onto the result of compiling a FunctionStencil
(CompiledCodeBase<Stencil>) later on to become a consumable
CompiledCode.
@github-actions github-actions bot added the cranelift:area:aarch64 Issues related to AArch64 backend. label Aug 9, 2022
@bnjbvr
Copy link
Member Author

bnjbvr commented Aug 10, 2022

I've updated the initial text of this PR, and while not ideal/perfect, I claim this is ready for review and could be merged as a first step. Remaining things to do as follow-ups, either myself in my copious free time, or other interested contributors 😍 :

  • fuzzing could be improved, to generate an instruction that could serve as replacement of another instruction. This would likely also require a position where to put this instruction, and some care ensuring that the instruction is compatible (same number of input/returns + same types for all inputs/returns)
  • there's a proliferation of Name types that could potentially be unified, maybe?
  • ExternalName could just become a reference into the FunctionParameters mapping of reference to actual name value, as (it's likely, not proven that) codegen doesn't care about the identity of the function callee. So ExternalName would become a new entity type, and there'd be ExternalNameData in the table. Might not be true for libcalls which identity may matter during codegen, so may require that ExternalName becomes a sum type: either a reference into the FunctionParameter's table, or the Libcall embedded itself.
  • right now, if a UserExternalNameRef changes (that is, a function call switches the callee identity), then it changes the identity of the FunctionStencil, thus resulting in a cache miss; this could likely be refactored and improved, enabling more cache hits.
  • optimization work could be done to identify why a recompilation is, in the worst case, only 50% faster than compiling from scratch (it should be much higher in theory).

@bnjbvr bnjbvr marked this pull request as ready for review August 10, 2022 16:14
@bnjbvr bnjbvr requested a review from cfallin August 10, 2022 16:14
Unfortunately the previous implementation was more powerful, in the
sense that a function call (in a function body) could change a call
target, and that would result in a cache hit. With the new
FunctionStencil approach this is not the case anymore because that
changes the `FunctionStencil` identity. A future refactoring could make
it possible to support caching for that specific case.
@bnjbvr
Copy link
Member Author

bnjbvr commented Aug 11, 2022

@cfallin This new version incorporates changes to address your review feedback during the live review session (that we organized to help going through the changes, as this is a large patch). This is now ready and passes cargo test --all on my machine, with no performance regression on compile times, so I would claim this is ready :-) Cheers!

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks great to me! Thanks @bnjbvr for the patience and perseverance on this one. After our live review today I'm confident that this is likely to work well and be maintainable, and have the safety properties we would expect; in particular, the type-level separation of cacheable vs. change-resilient state, and the embedding of version markers in the key and value both.

It looks like there's a CI failure that may be the result of the new Rust version (a new warning appeared?); I'll look into it. I don't think it's caused by this PR.

@cfallin
Copy link
Member

cfallin commented Aug 11, 2022

I just merged in main to this PR to see if it fixes CI.

@cfallin
Copy link
Member

cfallin commented Aug 11, 2022

Hmm, @bnjbvr it looks like some of the changes broke the fuzz targets -- happy to merge once that is fixed up.

@bnjbvr bnjbvr enabled auto-merge (squash) August 12, 2022 16:15
@bnjbvr bnjbvr disabled auto-merge August 12, 2022 16:15
@bnjbvr bnjbvr enabled auto-merge (squash) August 12, 2022 16:16
Comment on lines -638 to -644
fn declare_func_in_data(&self, func: FuncId, ctx: &mut DataContext) -> ir::FuncRef {
(**self).declare_func_in_data(func, ctx)
}

fn declare_data_in_data(&self, data: DataId, ctx: &mut DataContext) -> ir::GlobalValue {
(**self).declare_data_in_data(data, ctx)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These haven't yet been re-added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, seems I added it back in one implementation, but not in the trait. Is that something that you were using, @bjorn3? I could add it back later, if required

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change means that declare_func_in_data and declare_data_in_data are no longer forwarded to FooModule when you call them on a &mut FooModule, which is incorrect if those functions have been overwritten by FooModule.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see; it's likely subpar design if the default implementation is incorrect for some implementations by default, but that's a different discussion to have :) Will post a small patch for this chunk.

@bnjbvr bnjbvr merged commit 8a9b1a9 into bytecodealliance:main Aug 12, 2022
@bnjbvr bnjbvr deleted the wip-05-09_wasm_cache branch August 16, 2022 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:meta Everything related to the meta-language. cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incremental compilation cache in Cranelift
4 participants