-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up registration of stack maps #2820
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Subscribe to Label Actioncc @peterhuene
This issue or pull request has been labeled: "wasmtime:api"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
} | ||
|
||
/// A trait used to abstract how stack maps are located. | ||
pub trait StackMapLookup { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: why the trait over using a method on CompiledModule
(similar to func_by_pc
and func_info
) and storing Arc<CompiledModule>
vs the trait object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only because the CompiledModule
lives in the wasmtime_jit
crate which this wasmtime_runtime
crate doesn't have access to. Otherwise though there's definitely no need for dynamic trait dispatch here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I guess the alternative would be to use StackMapLookup
via the VMContext
, rather than a registry pointer, freeing the registry implementation to live in the wasmtime
crate, but that's just trading dynamic-dispatch during module instantiation to dynamic-dispatch at externref gc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we did move the registry implementation to wasmtime
, I think this data structure could be a consolidated one replacing StoreFrameInfo
too, basically just a "which CompiledModule
should I be talking to given this pc?" which responds with an answer that we then use to lookup the appropriate information (gimmie frame info, gimmie trap info, gimmie stack maps, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, given externref gc being a relatively expensive (and hopefully very infrequent) operation and the cost of doing trait object dispatch per frame would be negligible, I would argue VMContext
should have *const dyn StackMapLookup
rather than *mut StackMapRegistry
.
This frees us to remove StackMapRegistry
and StoreFrameInfo
in favor of a single CompiledModuleRegistry
per store that lives in wasmtime
which impls the runtime's StackMapLookup
and can lookup frame and trap information as well.
Thoughts on this approach? Perhaps for another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm good points! I'll look to add this to this PR, seems reasonable to do here!
This commit fixes a perf issue I was seeing in some local benchmarking where registration of stack maps took a nontrivial amount of time for a module that didn't even use stack maps! The fix here is to largely just do the same thing as bytecodealliance#2811 which is to use the in-memory data structures of a `CompiledModule` rather than rebuilding different data structures when a module is registered with a `Store`. The `StackMapRegistry` type now takes a lookup object for a range of pcs, simplifying registration. Registration additionally doesn't even happen if a module is pre-determined to not have any stack maps inside of it. This trait object encapsulates all the logic that was previously used in the rebuilt data structures and also leverages conveniences like `CompiledModule::func_by_pc` added recently as well. With this all combined it means that modules which don't have stack maps now skip this registration step entirely and modules with stack maps should do much less work during registration as well.
4001e35
to
9f1aa64
Compare
This looks well handled by #2842, so closing in favor of that! |
This commit fixes a perf issue I was seeing in some local benchmarking
where registration of stack maps took a nontrivial amount of time for a
module that didn't even use stack maps! The fix here is to largely just
do the same thing as #2811 which is to use the in-memory data structures
of a
CompiledModule
rather than rebuilding different data structureswhen a module is registered with a
Store
.The
StackMapRegistry
type now takes a lookup object for a range ofpcs, simplifying registration. Registration additionally doesn't even
happen if a module is pre-determined to not have any stack maps inside
of it. This trait object encapsulates all the logic that was previously
used in the rebuilt data structures and also leverages conveniences like
CompiledModule::func_by_pc
added recently as well.With this all combined it means that modules which don't have stack maps
now skip this registration step entirely and modules with stack maps
should do much less work during registration as well.