riscv64: Improve codegen for `icmp` #7203

alexcrichton · 2023-10-10T10:26:06Z

This PR is another stab towards improving icmp along the lines of #6112 and #7113. This tries to draw on both of those PRs a bit to end up with something that's usable by the majority of the backend for use with branches.

I've attempted to break things up piecemeal here so the commits in theory are understandable commit-by-commit. This additionally doesn't implement every single optimization I could think of, mostly just a few that seemed to be the low-hanging fruit ones.

Skip generating extra sign-extension instructions in this case because the materialization of a constant will implicitly sign-extend into the entire register.

Try to reserve `lower_*` as taking a thing and producing a `*Reg`. Rename this helper to `is_nonzero_cmp` to represent how it's testing for nonzero and producing a comparison.

* `FCmp` -> `FloatCompare` * `emit_fcmp` -> `fcmp_to_float_compare` * `lower_fcmp` -> `lower_float_compare` Make some room for upcoming integer comparison functions.

This is only used by one lowering so inline its definition directly.

This commit is the first in a few steps to reimplement bytecodealliance#6112 and bytecodealliance#7113. The `Icmp` pseudo-instruction is removed here and necessary functionality is all pushed into ISLE lowerings. This enables deleting the `lower_br_icmp` helper in `emit.rs` as it's no longer necessary, meaning that all conditional branches should ideally be generated in lowering rather than pseudo-instructions to benefit from various optimizations. Currently the lowering is the bare-bones minimum to get things working. This involved creating a helper to lower an `IntegerCompare` into an `XReg` via negation/swapping args/etc. In generated code this removes branches and makes `icmp` a straight-line instruction for non-128-bit arguments.

Use the `x0` register which is always zero where possible and avoid unnecessary `xor`s against this register.

github-actions · 2023-10-10T10:44:58Z

Subscribe to Label Action

cc @cfallin, @fitzgen

This issue or pull request has been labeled: "cranelift", "isle"

Thus the following users have been cc'd because of the following labels:

cfallin: isle
fitzgen: isle

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Matches the sign-extension that happens at the hardware layer.

afonso360

This is very nice! Thank you! ❤️ Having the commits separated neatly really helped review this.

It all looks correct to me. But I'm not entirely sure I didn't miss anything, so a second pair of eyes would be helpful!

I'm also going to run the fuzzer on this for a few days to check if anything comes up. (Although I don't think that should prevent merging this)

afonso360 · 2023-10-10T13:43:20Z

cranelift/codegen/src/isa/riscv64/inst.isle

@@ -2136,6 +2145,9 @@
 (decl pure partial val_already_extended (ExtendOp Value) bool)
 (rule 0 (val_already_extended _ v @ (value_type $I64)) $true)

+;; Immediates are always sign extended to their full 64-bits
+(rule 1 (val_already_extended (ExtendOp.Signed) (iconst _)) $true)


I don't think this is entirely correct, but it probably doesn't matter.

Due to #6922 we don't generate all 32bit constants from lui+addi so there are some that fallback to the general load from const pool path.

That path uses the ld instruction to load the full 8 bytes, but we never sign extend that value while lowering. There is a chance that in some of the 32bit constants that we miss we may not sign extend them.

I've built a small test to check which constants fail to generate, and the lowest one is 0x7FFF_F801 so all the negatives are covered. So I don't think there's anything wrong here.

It might be worth ensuring that the value is sign extended before emitting it to the constant pool anyway. Or switching to lw that only loads 4 bytes and sign extends the load result. Just to make sure that this doesn't happen in the future.

Thanks for the good eye here, I'll dig into this a bit more and possibly omit this now that there's more constant-related optimizations for comparisons anyway. I'll probably leave this out for an initial attempt.

afonso360 · 2023-10-11T05:59:03Z

Fuzzing has turned up the following testcase:

test interpret
test run
target riscv64gc

function %a() -> i8 {
block0:
    v9 = iconst.i8 128
    v11 = iconst.i8 0
    v17 = icmp uge v9, v11  ; v9 = 128, v11 = 0
    return v17
}

; run: %a() == 1

alexcrichton · 2023-10-11T06:16:05Z

Do you have native RISC-V hardware that you're fuzzing on? Or are you running fuzzing through QEMU emulation? If the latter I think I'll try to throw some compute at this as well to help weed out more too.

afonso360 · 2023-10-11T06:19:16Z

I've been running this natively, but honestly it's not much faster than running it on QEMU on my desktop, so it might help. This example took like 8 hours to show up.

For comparison, Natively I'm getting about 50 execs/s/core * 4 cores. (with -max_len=128)

Edit: Also, if you're going to run the fuzzer, you might want to pull #7208 in, that triggered fairly quickly for me.

These aren't correct and will need updating

alexcrichton · 2023-10-12T06:19:04Z

Ok so I tried a bit of fuzzing locally and it found a number of crashes but I was unfortunately unable to reproduce outside of the fuzzer to diagnose further. The test cases all reproduce on main however so I don't think they're related to this PR and at least one of them reproduced after I applied the addi4spn fix too. Reproduction has been difficult due to the fact that these tests are using libcalls I believe. One issue to fix was that clif-util test doesn't support libcalls in its test interpret mode, but while that was easy enough to fix I hit a panic where relocations failed to be appplied because I assume the libcall target was >4G away as an assertion in Reloc::RiscvCallPlt was tripped.

In any case I remove the iconst-is-always-extended logic and otherwise I'll likely be relying on the fuzzing your doing for this as I've not been too successful locally. It did run for a few hours though without finding other bugs which seems good!

Additionally I removed the rules that modify an immediate with +1 since that was the cause of the issue fuzzing found above. I'll need to look more at that later, but figure this may be good to get in without blocking on that.

afonso360 · 2023-10-12T09:03:22Z

Wow, that's interesting, I had noticed the clif-util test can't run libcalls, but had forgotten about it since It doesn't cause issues for me that often. But I'm going to try to investigate the other ones.

In any case, I'm going to start fuzzing these latest changes.

alexcrichton · 2023-10-12T14:46:36Z

crashes.tar.gz are the crashes that fuzzing found locally for me

afonso360 · 2023-10-12T18:02:17Z

This one showed up today:

test interpret
test run
target riscv64gc

function %a() -> i8  {
block0:
    v11 = iconst.i8 196
    v17 = icmp ugt v11, v11
    return v17
}

; run: %a() == 0

alexcrichton · 2023-10-14T16:16:34Z

Ooh nice catch, more signed-vs-unsigned handling that needed fixing (fuzzing is great!)

afonso360 · 2023-10-16T11:53:50Z

This has been fuzzing continuously for the past 36 hours without any crashes 🎉 I'm going to consider that good enough.

alexcrichton added 13 commits October 9, 2023 13:18

riscv64: Constants are always sign-extended

1f7df79

Skip generating extra sign-extension instructions in this case because the materialization of a constant will implicitly sign-extend into the entire register.

riscv64: Rename lower_int_compare helper

713650b

Try to reserve `lower_*` as taking a thing and producing a `*Reg`. Rename this helper to `is_nonzero_cmp` to represent how it's testing for nonzero and producing a comparison.

riscv64: Rename some float comparison helpers

2092d0f

* `FCmp` -> `FloatCompare` * `emit_fcmp` -> `fcmp_to_float_compare` * `lower_fcmp` -> `lower_float_compare` Make some room for upcoming integer comparison functions.

riscv64: Remove ordered helper

f6dc18e

This is only used by one lowering so inline its definition directly.

riscv64: Remove an unused helper

7a17ebb

riscv64: Optimize comparisons with 0

d6b8d79

Use the `x0` register which is always zero where possible and avoid unnecessary `xor`s against this register.

riscv64: Specialize a < $imm

1ab35d3

riscv64: Optimize Equal/NotEqual against constants

cec5da2

riscv64: Optimize LessThan with constant argument

82903a7

Optimize a <= $imm

6b1e7b9

riscv64: Optimize a >= $imm

eda9ed5

riscv64: Add comment for new helper

8cf605a

alexcrichton requested a review from afonso360 October 10, 2023 10:26

alexcrichton requested a review from a team as a code owner October 10, 2023 10:26

alexcrichton requested review from cfallin and removed request for a team October 10, 2023 10:26

github-actions bot added cranelift Issues related to the Cranelift code generator isle Related to the ISLE domain-specific language labels Oct 10, 2023

alexcrichton added 2 commits October 10, 2023 04:47

Use i64 in icmp optimizations

91a27c4

Matches the sign-extension that happens at the hardware layer.

Correct some sign extensions

38e2f79

afonso360 approved these changes Oct 10, 2023

View reviewed changes

alexcrichton and others added 4 commits October 11, 2023 23:05

riscv64: Don't assume immediates are extended

72a3a79

riscv64: Fix encoding for c.addi4spn

221f623

riscv64: Remove icmp lowerings which modify constants

c6571f9

These aren't correct and will need updating

Add regression test

0ac34c6

riscv64: Fix handling unsigned comparisons with constants

24cf381

afonso360 added this pull request to the merge queue Oct 16, 2023

Merged via the queue into bytecodealliance:main with commit 4f49393 Oct 16, 2023
25 checks passed

alexcrichton deleted the rv64-icmp branch October 16, 2023 14:03

afonso360 mentioned this pull request Oct 21, 2023

cranelift: Add some libcalls to test interpret #7320

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

riscv64: Improve codegen for `icmp` #7203

riscv64: Improve codegen for `icmp` #7203

alexcrichton commented Oct 10, 2023

github-actions bot commented Oct 10, 2023

afonso360 left a comment •

edited

Loading

afonso360 Oct 10, 2023 •

edited

Loading

alexcrichton Oct 11, 2023

afonso360 commented Oct 11, 2023

alexcrichton commented Oct 11, 2023

afonso360 commented Oct 11, 2023 •

edited

Loading

alexcrichton commented Oct 12, 2023

afonso360 commented Oct 12, 2023

alexcrichton commented Oct 12, 2023

afonso360 commented Oct 12, 2023

alexcrichton commented Oct 14, 2023

afonso360 commented Oct 16, 2023

riscv64: Improve codegen for icmp #7203

riscv64: Improve codegen for icmp #7203

Conversation

alexcrichton commented Oct 10, 2023

github-actions bot commented Oct 10, 2023

Subscribe to Label Action

afonso360 left a comment • edited Loading

Choose a reason for hiding this comment

afonso360 Oct 10, 2023 • edited Loading

Choose a reason for hiding this comment

alexcrichton Oct 11, 2023

Choose a reason for hiding this comment

afonso360 commented Oct 11, 2023

alexcrichton commented Oct 11, 2023

afonso360 commented Oct 11, 2023 • edited Loading

alexcrichton commented Oct 12, 2023

afonso360 commented Oct 12, 2023

alexcrichton commented Oct 12, 2023

afonso360 commented Oct 12, 2023

alexcrichton commented Oct 14, 2023

afonso360 commented Oct 16, 2023

riscv64: Improve codegen for `icmp` #7203

riscv64: Improve codegen for `icmp` #7203

afonso360 left a comment •

edited

Loading

afonso360 Oct 10, 2023 •

edited

Loading

afonso360 commented Oct 11, 2023 •

edited

Loading