Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cranelift-frontend: Support moving GCs with user stack maps #8978

Merged
merged 2 commits into from
Jul 22, 2024

Conversation

fitzgen
Copy link
Member

@fitzgen fitzgen commented Jul 19, 2024

This refactors the way that user stack maps work such that they are compatible with moving GCs. We now keep every GC value in a stack slot, spill to the stack slot on the value's definition, and reload from the stack slot on every use. This means that we can generate a bunch of redundant loads, but we let the mid-end's alias analysis clean that up. This has the added benefit of reducing the length of live ranges for GC values and also means we don't need to proactively spill every single live GC values at each safepoint, because we know they are already spilled. This latter bit actually helps us avoid an accidentally quadratic issue with many, long, overlapping live ranges where we would do O(n^2) spills for a series of safepoints that keep creating more GC refs.

We also implement two further optimizations:

  1. We lazily allocate slots for live GC values, which means that if a GC value is not ever live across a safepoint, then we do never allocate a stack slot for it and never spill it to the stack slot. If we didn't do this, then frames would be larger and we'd have a dead store to the stack slot that would otherwise require the mid-end to grow a dead-store-elimination pass.

  2. We keep a free list of available slots that will never be used again, and we reuse slots from here when possible. This means that a chain of non-overlapping live ranges, each of which still needs to appear in some safepoint's stack map, will all reuse the same stack slot, keeping frames from being bloated.

Finally, this commit also introduces some logs for the liveness analysis to ease future debugging.

This refactors the way that user stack maps work such that they are compatible
with moving GCs. We now keep every GC value in a stack slot, spill to the stack
slot on the value's definition, and reload from the stack slot on every
use. This means that we can generate a bunch of redundant loads, but we let the
mid-end's alias analysis clean that up. This has the added benefit of reducing
the length of live ranges for GC values and also means we don't need to
proactively spill every single live GC values at each safepoint, because we know
they are already spilled. This latter bit actually helps us avoid an
accidentally quadratic issue with many, long, overlapping live ranges where we
would do `O(n^2)` spills for a series of safepoints that keep creating more GC
refs.

We also implement two further optimizations:

1. We lazily allocate slots for live GC values, which means that if a GC value
is not ever live across a safepoint, then we do never allocate a stack slot for
it and never spill it to the stack slot. If we didn't do this, then frames would
be larger and we'd have a dead store to the stack slot that would otherwise
require the mid-end to grow a dead-store-elimination pass.

2. We keep a free list of available slots that will never be used again, and we
reuse slots from here when possible. This means that a chain of non-overlapping
live ranges, each of which still needs to appear in some safepoint's stack map,
will all reuse the same stack slot, keeping frames from being bloated.

Finally, this commit also introduces some logs for the liveness analysis to ease
future debugging.

Co-Authored-By: Trevor Elliott <telliott@fastly.com>
@fitzgen fitzgen requested review from a team as code owners July 19, 2024 18:21
@fitzgen fitzgen requested review from cfallin and removed request for a team July 19, 2024 18:21
Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is correct, just some nits below -- I'm glad our idea to fall back to stackslots is working out!

I think we can land as-is if you prefer, but I do have some thoughts spawned from the handling of blockparams. My original thought when I suggested this was that (perhaps naively, but...) it would operate closer to the level of the frontend's "variables" that we use to represent Wasm locals, so that somehow we wouldn't need to handle blockparams -- since by construction, we insert blockparams only to carry the same variable from source block to dest block. I worry that inserting reloads and respills (if I've understood the above correctly) could lead to unexpected worst-case behavior still. (In the vars_block_params_and_needs_stack_map test example, x would have one stackslot, we would overwrite it in both block1 and block2, and we wouldn't need loads at those blocks' tails and a re-store at block3.)

I think that if we have a closer correspondence between frontend variables (i.e. Wasm locals) and stackslots, we have a more understandable cost-model from the Wasm producer's point of view as well: at least in baseline compilers, locals generally become actual stack slots, so producers try to minimize local count and reuse them where possible. If we can carry over that consolidation to our IR, we have an understandable model of "one store per write to a Wasm local, one load per use of a Wasm local" which is great.

As things are now I think we can still see higher-than-linear resulting code size from linear number of reads/writes to refs: imagine a br_table that fans out to N blocks, the first (innermost) of which overwrites a reftyped Wasm local with an existing def. Then when we process end opcodes and generate N merge-points in sequence, each merge point will have to reload and re-store the slot. Ideally we'd instead directly translate the update to the variable to a stack_store and the uses to stack_loads.

I think that might point the way to an implementation solution that also avoids the need for the extra pass: branch in the variable-to-SSA-value translation to an alternate "value just lives on stack" path.

Thoughts?

cranelift/frontend/src/frontend.rs Outdated Show resolved Hide resolved
// A map from size/align to free stack slots that are not being used
// anymore. This allows us to reuse stack slots across multiple values
// helps cut down on the ultimate size of our stack frames.
let mut free_stack_slots: crate::HashMap<u32, SmallVec<[ir::StackSlot; 4]>> =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not too bad to do, could we have an enum for stack-slot size rather than a bare u32? Something like enum StackSlotSize { Slot4, Slot8 } and perhaps .bytes() and .align() methods as needed; and a TryFrom<ir::Type> impl maybe too...

(To be clear, I'm suggesting this as a refactor wherever slot-sizes appear, the hashmap key here just prompted the thought)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would actually need quite a few slot sizes, since we have types like f64x8 in CLIF. I'm not fully convinced that we would actually benefit from such an enum, and even if we do want to go down that route, I'd want to do it in a separate PR.

I did waffle back and forth between a full hash map here vs an array indexed by log2(size) like the old approach did all over the place. I opted to just go for the simplest, most-obvious implementation approach and figured we can come back and optimize if it is ever a perf bottleneck, but I suspect it really isn't.

So given all that: are you okay with keeping this as-is for now at least?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it doesn't need to happen as part of this PR. (The reason for TryFrom<ir::Type> rather than From was exactly to address this -- I don't think we need to define a slot size for every type, only for types that the cranelift-frontend user puts into stackmaps -- and in general I find bare integers pretty error-prone; but happy to discuss further!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this in #8991

cranelift/frontend/src/frontend.rs Outdated Show resolved Hide resolved
cranelift/frontend/src/frontend.rs Outdated Show resolved Hide resolved
cranelift/frontend/src/frontend.rs Show resolved Hide resolved
@fitzgen
Copy link
Member Author

fitzgen commented Jul 19, 2024

operate closer to the level of the frontend's "variables"

I also thought about creating a new entity for these things but I thought through how it would be used in cranelift-wasm and wasmtime-cranelift and my conclusion was that it would be a bit of a mess. First, we aren't just dealing with wasm locals, which are Variables; we are also dealing with the results of calls and things like that, which produce Values. So we have to have a solution for Values either way. Second, cranelift-wasm doesn't make the decision whether certain types are managed by the GC or not, it is agnostic of runtime structures and implementation, wasmtime-cranelift makes those decisions based on its Wasmtime-specific knowledge. That means that cranelift-wasm would have to have some kind of enum of either a GC-managed Variable (or other new entity) or a plain Value and use that for its abstract operand stack, at which point the whole crate, essentially, is infected with this change and things are getting really hairy.

For this reason, we decided it was better, at least for now, to play nice with the existing system and its existing entities, rather than overhaul everything.

I think that might point the way to an implementation solution that also avoids the need for the extra pass: branch in the variable-to-SSA-value translation to an alternate "value just lives on stack" path.

Unfortunately this won't work because the stack slots and liveness and all that are not created until we are finishing the function builder, long after calls to builder.use_var.

Trevor and I talked about another solution for this specific problem though: if we maintain a backwards map from Value to the Variable that it is associated with (if any) then we can force all Values for the same Variable to use the same stack slot. For Values that are not associated with a Variable, they would use the same approach we have in this PR. But this is work for the future.

@fitzgen
Copy link
Member Author

fitzgen commented Jul 19, 2024

I think we can land as-is if you prefer

I'm going to go ahead and enqueue this for merging, but I am still interested in your responses above and in the inline thread!

@fitzgen fitzgen added this pull request to the merge queue Jul 19, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 19, 2024
@github-actions github-actions bot added the cranelift Issues related to the Cranelift code generator label Jul 19, 2024
@fitzgen fitzgen added this pull request to the merge queue Jul 22, 2024
Merged via the queue into bytecodealliance:main with commit 2028224 Jul 22, 2024
37 checks passed
@fitzgen fitzgen deleted the user-stack-maps-and-moving-gc branch July 22, 2024 17:46
fitzgen added a commit to fitzgen/wasmtime that referenced this pull request Jul 22, 2024
github-merge-queue bot pushed a commit that referenced this pull request Jul 22, 2024
* Move safepoints and stack maps code to its own module

This is purely mechanical code motion.

* Shorten log messages' prefix

These are all inside a "safepoint" module, and `env_logger` prints the module
before the log message, so the old "stack map" part of the log messages' prefix
is a bit redundant now.

* Use a typed enum for stack slot sizes

Rather than an open-coded `u32`.

Addresses #8978 (comment)

* Update some comments

These comments were referencing things that we did in the old implementation of
stack maps and safepoints, and have been updated for the new approach.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants