Add a `MachBuffer::defer_trap` method #6011

alexcrichton · 2023-03-13T22:40:58Z

This commit adds a new method to MachBuffer to defer trap opcodes to the end of a function in a similar manner to how constants are deferred to the end of the function. This is useful for backends which frequently use TrapIf-style opcodes. Currently a jump is emitted which skips the next instruction, a trap, and then execution continues normally. While there isn't any pressing problem with this construction the trap opcode is in the middle of the instruction stream as opposed to "off on the side" despite rarely being taken.

With this method in place all the backends (except riscv64 since I couldn't figure it out easily enough) have a new lowering of their TrapIf opcode. Now a trap is deferred, which returns a label, and then that label is jumped to when executing the trap. A fixup is then recorded in MachBuffer to get patched later on during emission, or at the end of the function. Subsequently all TrapIf instructions translate to a single branch plus a single trap at the end of the function.

I've additionally further updated some more lowerings in the x64 backend which were explicitly using traps to instead use TrapIf where applicable to avoid jumping over traps mid-function. Other backends didn't appear to have many jump-over-the-next-trap patterns.

Lots of tests have had their expectations updated here which should reflect all the traps being sunk to the end of functions.

jameysharp

This looks like a nice optimization! Overall I think this is a good approach. I just have a couple questions about details and a typo or two.

cranelift/codegen/src/isa/s390x/inst/emit.rs

cranelift/codegen/src/isa/x64/inst/emit.rs

jameysharp · 2023-03-14T00:30:42Z

cranelift/codegen/src/machinst/buffer.rs

+    /// label to its offset. The trap will be placed at most `max_distance`
+    /// from the current offset.
+    pub fn defer_trap(&mut self, code: TrapCode, stack_map: Option<StackMap>) -> MachLabel {


defer_trap doesn't have the max_distance argument that its doc-comment refers to. Do we need a max-distance for branch instructions on any target? Also it isn't "the given label", since it returns the label. I assume the comment was just copied from defer_constant.

As future work, could defer_trap deduplicate the trap pool by returning the same MachLabel when given the same TrapCode? (At least if stack_map is None.) Or do we need to know exactly which check trapped?

Aha you've reminded me of something I forgot in this PR which was to track the source location of traps to get the correct file origin information when a trap happens. Previously this would naturally happen due to the source location tracking but by moving these to the end the tracking is lost. It turns out we don't have anything in the wasm test suite testing this just yet! I've now added in source location tracking for deferred traps in addition to some more tests which require the logic to be here.

So now to get back to your question, about deduplicating, maybe! The source location adds another vector by which this needs to be considered for deduplication, and it's pretty unlikely to get deduplicated with the source information (except for perhaps some of those x64 lowerings of float-to-int conversions which have a bunch of traps). With source locations considered too, though, I think it's safe to deduplicate in the future.

cranelift/codegen/src/machinst/buffer.rs

cranelift/codegen/src/isa/aarch64/inst/mod.rs

cranelift/codegen/src/machinst/buffer.rs

alexcrichton · 2023-03-14T03:58:43Z

@uweigand If you're able I'd like to get your review on the last commit of this PR, 0fd035c, where I updated how signal handling works on s390x. Trap emission is now buried within MachBuffer so while I could plumb through a flag for "register the trap code on the last byte of the opcode vs the first" I thought it might be easier to update the s390x-specific logic in the signal handler instead to avoid the s390x-specific-ness from spilling over into the mostly-independent MachBuffer. You probably know much more about the intricacies here, though, and I'd like to confirm that things should work as I expect/hope.

uweigand · 2023-03-14T18:18:38Z

@uweigand If you're able I'd like to get your review on the last commit of this PR, 0fd035c, where I updated how signal handling works on s390x. Trap emission is now buried within MachBuffer so while I could plumb through a flag for "register the trap code on the last byte of the opcode vs the first" I thought it might be easier to update the s390x-specific logic in the signal handler instead to avoid the s390x-specific-ness from spilling over into the mostly-independent MachBuffer. You probably know much more about the intricacies here, though, and I'd like to confirm that things should work as I expect/hope.

Hmm. As far as I can tell, this will work - at least for now. The assumption that only 2-byte instructions can result in a SIGILL is of course not correct in general, that's just a consequence of the particular subset of instructions the back-end is currently using, so I'd be a bit hesitant to hard-code this ...

The whole approach of moving trap instructions to the end of the function may not be the best approach on s390x anyway. Other compilers tend to use a trick (that I haven't implemented yet in cranelift) where the "trap" instruction overlaps the conditional branch instruction. Specifically, to implement a conditional trap, you can use something like

   jgCOND .+2

which is encoded as a 6-byte instruction like so (with M encoding the condition):

  C0 M4 00 00 00 01

If the condition is true, this will branch two bytes forward, landing on the 00 00 bytes embedded in the branch itself - which now double as the trap instruction.

This is the same size as a branch to the end of the function, but doesn't actually require anything to be there.

alexcrichton · 2023-03-14T18:21:21Z

Ah ok makes sense, would it be best to not apply this change to s390x in that case?

uweigand · 2023-03-14T18:37:49Z

Ah ok makes sense, would it be best to not apply this change to s390x in that case?

Agreed. Just leave s390x out for now, and I'll implement that other change separately. Thanks!

jameysharp

Nice, I enjoyed seeing the filetests become that little bit easier to read. I have one suggestion, but feel free to merge with or without it.

tests/all/traps.rs

This commit adds a new method to `MachBuffer` to defer trap opcodes to the end of a function in a similar manner to how constants are deferred to the end of the function. This is useful for backends which frequently use `TrapIf`-style opcodes. Currently a jump is emitted which skips the next instruction, a trap, and then execution continues normally. While there isn't any pressing problem with this construction the trap opcode is in the middle of the instruction stream as opposed to "off on the side" despite rarely being taken. With this method in place all the backends (except riscv64 since I couldn't figure it out easily enough) have a new lowering of their `TrapIf` opcode. Now a trap is deferred, which returns a label, and then that label is jumped to when executing the trap. A fixup is then recorded in `MachBuffer` to get patched later on during emission, or at the end of the function. Subsequently all `TrapIf` instructions translate to a single branch plus a single trap at the end of the function. I've additionally further updated some more lowerings in the x64 backend which were explicitly using traps to instead use `TrapIf` where applicable to avoid jumping over traps mid-function. Other backends didn't appear to have many jump-over-the-next-trap patterns. Lots of tests have had their expectations updated here which should reflect all the traps being sunk to the end of functions.

The MachBuffer was registering trap codes with the first byte of the trap, but the SIGILL handler was expecting it to be registered with the last byte of the trap. Exploit that SIGILL is always represented with a 2-byte instruction and always march 2-backwards for SIGILL, continuing to march backwards 1 byte for SIGFPE-generating instructions.

Following up on the discussion in bytecodealliance#6011 this adds an improved implementation of TrapIf for s390x using a single conditional branch instruction. If the trap conditions is true, we branch into the middle of the branch instruction - those middle two bytes are zero, which matches the encoding of the trap instruction.

uweigand · 2023-03-21T16:07:17Z

@alexcrichton I've now implemented the s390x solution discussed above as #6079 .

Following up on the discussion in bytecodealliance#6011 this adds an improved implementation of TrapIf for s390x using a single conditional branch instruction. If the trap conditions is true, we branch into the middle of the branch instruction - those middle two bytes are zero, which matches the encoding of the trap instruction. In addition, show the trap code for Trap and TrapIf instructions in assembler output.

Following up on the discussion in #6011 this adds an improved implementation of TrapIf for s390x using a single conditional branch instruction. If the trap conditions is true, we branch into the middle of the branch instruction - those middle two bytes are zero, which matches the encoding of the trap instruction. In addition, show the trap code for Trap and TrapIf instructions in assembler output.

jameysharp reviewed Mar 14, 2023

View reviewed changes

alexcrichton force-pushed the traps-at-end branch from aa65527 to 0fd035c Compare March 14, 2023 03:56

alexcrichton force-pushed the traps-at-end branch from 67e6ca3 to 1f2491e Compare March 16, 2023 18:59

alexcrichton requested a review from jameysharp March 17, 2023 14:58

jameysharp approved these changes Mar 20, 2023

View reviewed changes

tests/all/traps.rs Outdated Show resolved Hide resolved

alexcrichton added 9 commits March 20, 2023 12:31

Print trap code on all platforms

40bea38

Emit traps before constants

637edb1

Preserve source location information for traps

17e7a27

Fix test expectations

2760ff9

Back out s390x changes

f0e0ec8

Back out more s390x bits

82b56d4

Review comments

38831bf

alexcrichton force-pushed the traps-at-end branch from 1f2491e to 38831bf Compare March 20, 2023 20:46

alexcrichton enabled auto-merge March 20, 2023 20:56

alexcrichton added this pull request to the merge queue Mar 20, 2023

alexcrichton merged commit a3b2103 into bytecodealliance:main Mar 20, 2023

alexcrichton deleted the traps-at-end branch March 20, 2023 21:56

uweigand mentioned this pull request Mar 21, 2023

s390x: Improved TrapIf implementation #6079

Merged

jameysharp mentioned this pull request Apr 5, 2023

Add release notes for 8.0.0 #6145

Merged

afonso360 mentioned this pull request Sep 15, 2023

riscv64: Cleanup trap handling #7047

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a `MachBuffer::defer_trap` method #6011

Add a `MachBuffer::defer_trap` method #6011

alexcrichton commented Mar 13, 2023

jameysharp left a comment

jameysharp Mar 14, 2023

alexcrichton Mar 14, 2023

alexcrichton commented Mar 14, 2023

uweigand commented Mar 14, 2023

alexcrichton commented Mar 14, 2023

uweigand commented Mar 14, 2023

jameysharp left a comment

uweigand commented Mar 21, 2023

Add a MachBuffer::defer_trap method #6011

Add a MachBuffer::defer_trap method #6011

Conversation

alexcrichton commented Mar 13, 2023

jameysharp left a comment

Choose a reason for hiding this comment

jameysharp Mar 14, 2023

Choose a reason for hiding this comment

alexcrichton Mar 14, 2023

Choose a reason for hiding this comment

alexcrichton commented Mar 14, 2023

uweigand commented Mar 14, 2023

alexcrichton commented Mar 14, 2023

uweigand commented Mar 14, 2023

jameysharp left a comment

Choose a reason for hiding this comment

uweigand commented Mar 21, 2023

Add a `MachBuffer::defer_trap` method #6011

Add a `MachBuffer::defer_trap` method #6011