Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

riscv64: Improve icmp codegen #7113

Closed

Conversation

alexcrichton
Copy link
Member

I've been curious to poke around more at riscv64 and it looks like the
handling of branches/comparisons can be improved in our backend, so this
is the first of what may be a number of commits to improve the situation
here. This commit specifically targets the icmp Cranelift instruction
when producing a 0 or 1 value as a result. This is unlikely to be used
all that much in normal programs since most of the time a comparison is
fed into a branch for example.

Nevertheless I was hoping to start simple and work my way out towards
branches eventually. My hope is that by improving this codegen this can
be extracted to helpers later on to assist branches and various other
lowerings.

One part that this commit removes is various sign-extensions around
icmp because, at least according to RISC-V's ABI, values are always
sign extended when sitting at rest in their registers. I'm not sure if
Cranelift respects this everywhere just yet, but it seems like a good
rule of thumb to follow and if it causes issues it may be best to track
down other lowering rules to fix the problems.

Additionally this does not update icmp for 128-bit integers just yet.
This is only the comparisons necessary for register-size values or
smaller.

This makes it a bit more precise than `Reg`.
I've been curious to poke around more at riscv64 and it looks like the
handling of branches/comparisons can be improved in our backend, so this
is the first of what may be a number of commits to improve the situation
here. This commit specifically targets the `icmp` Cranelift instruction
when producing a 0 or 1 value as a result. This is unlikely to be used
all that much in normal programs since most of the time a comparison is
fed into a branch for example.

Nevertheless I was hoping to start simple and work my way out towards
branches eventually. My hope is that by improving this codegen this can
be extracted to helpers later on to assist branches and various other
lowerings.

One part that this commit removes is various sign-extensions around
`icmp` because, at least according to RISC-V's ABI, values are always
sign extended when sitting at rest in their registers. I'm not sure if
Cranelift respects this everywhere just yet, but it seems like a good
rule of thumb to follow and if it causes issues it may be best to track
down other lowering rules to fix the problems.

Additionally this does not update `icmp` for 128-bit integers just yet.
This is only the comparisons necessary for register-size values or
smaller.
@alexcrichton alexcrichton requested a review from a team as a code owner September 29, 2023 16:35
@alexcrichton alexcrichton requested review from elliottt and removed request for a team and elliottt September 29, 2023 16:35
@alexcrichton
Copy link
Member Author

Ok looks like this won't work for at least one reason which is that u8 + u8 does not mask (similar for other non-32 and non-64 sizes).

@afonso360 do you know if this is a known issue with the risc-v backend? I'd be happy to help fix this up, but I'm not sure how deep this necessarily goes.

@elliottt
Copy link
Member

Ok looks like this won't work for at least one reason which is that u8 + u8 does not mask (similar for other non-32 and non-64 sizes).

@afonso360 do you know if this is a known issue with the risc-v backend? I'd be happy to help fix this up, but I'm not sure how deep this necessarily goes.

The approach we were using in the past is to use normalize_cmp_value for the cases where we needed to make sure that a value had no non-zero upper bits. Probably that function should be renamed, and we should be more aggressive about using it when lowering operations that might rely on those upper bits.

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator isle Related to the ISLE domain-specific language labels Sep 29, 2023
@github-actions
Copy link

Subscribe to Label Action

cc @cfallin, @fitzgen

This issue or pull request has been labeled: "cranelift", "isle"

Thus the following users have been cc'd because of the following labels:

  • cfallin: isle
  • fitzgen: isle

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@alexcrichton
Copy link
Member Author

Ok so I've looked around a bit, my conclusion is I don't know enough to be making changes here yet. I'm going to go back to the drawing board on some of this and see what I can figure out.

@afonso360
Copy link
Contributor

This is something that we really need! Currently we emit at least 4 instructions when we could do just one! Thanks for looking into this.

Also this just reminded me that I had a icmp PR that I had forgot about. Feel free to pick out anything you need from there!

One part that this commit removes is various sign-extensions around
icmp because, at least according to RISC-V's ABI, values are always
sign extended when sitting at rest in their registers. I'm not sure if
Cranelift respects this everywhere just yet,

@afonso360 do you know if this is a known issue with the risc-v backend? I'd be happy to help fix this up, but I'm not sure how deep this necessarily goes.

We actually don't follow this at all. We follow the general cranelift convention that upper bits are undefined and we clear / sign extend them in the ops that need it.

I looked up the ABI document and I could only find that sign extend requirement with regards to the calling convention, but as far as I know that is handled somewhere else in cranelift right?

@alexcrichton
Copy link
Member Author

Oh dear apologies for missing that! I'll probably restart from there.

I also need to investigate what's up with the ABI business and all that, but ok makes sense that Cranelift considers the upper bits undefined state. I know the ABI bits should do sign-extension as necessary, so I'll try to hunt that down and confirm and make sure it's all working.

alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Oct 10, 2023
This commit is the first in a few steps to reimplement bytecodealliance#6112 and bytecodealliance#7113.
The `Icmp` pseudo-instruction is removed here and necessary
functionality is all pushed into ISLE lowerings. This enables deleting
the `lower_br_icmp` helper in `emit.rs` as it's no longer necessary,
meaning that all conditional branches should ideally be generated in
lowering rather than pseudo-instructions to benefit from various
optimizations.

Currently the lowering is the bare-bones minimum to get things working.
This involved creating a helper to lower an `IntegerCompare` into an
`XReg` via negation/swapping args/etc. In generated code this removes
branches and makes `icmp` a straight-line instruction for non-128-bit
arguments.
github-merge-queue bot pushed a commit that referenced this pull request Oct 16, 2023
* riscv64: Constants are always sign-extended

Skip generating extra sign-extension instructions in this case because
the materialization of a constant will implicitly sign-extend into the
entire register.

* riscv64: Rename `lower_int_compare` helper

Try to reserve `lower_*` as taking a thing and producing a `*Reg`.
Rename this helper to `is_nonzero_cmp` to represent how it's testing for
nonzero and producing a comparison.

* riscv64: Rename some float comparison helpers

* `FCmp` -> `FloatCompare`
* `emit_fcmp` -> `fcmp_to_float_compare`
* `lower_fcmp` -> `lower_float_compare`

Make some room for upcoming integer comparison functions.

* riscv64: Remove `ordered` helper

This is only used by one lowering so inline its definition directly.

* riscv64: Remove the `Icmp` pseudo-instruction

This commit is the first in a few steps to reimplement #6112 and #7113.
The `Icmp` pseudo-instruction is removed here and necessary
functionality is all pushed into ISLE lowerings. This enables deleting
the `lower_br_icmp` helper in `emit.rs` as it's no longer necessary,
meaning that all conditional branches should ideally be generated in
lowering rather than pseudo-instructions to benefit from various
optimizations.

Currently the lowering is the bare-bones minimum to get things working.
This involved creating a helper to lower an `IntegerCompare` into an
`XReg` via negation/swapping args/etc. In generated code this removes
branches and makes `icmp` a straight-line instruction for non-128-bit
arguments.

* riscv64: Remove an unused helper

* riscv64: Optimize comparisons with 0

Use the `x0` register which is always zero where possible and avoid
unnecessary `xor`s against this register.

* riscv64: Specialize `a < $imm`

* riscv64: Optimize Equal/NotEqual against constants

* riscv64: Optimize LessThan with constant argument

* Optimize `a <= $imm`

* riscv64: Optimize `a >= $imm`

* riscv64: Add comment for new helper

* Use i64 in icmp optimizations

Matches the sign-extension that happens at the hardware layer.

* Correct some sign extensions

* riscv64: Don't assume immediates are extended

* riscv64: Fix encoding for `c.addi4spn`

* riscv64: Remove `icmp` lowerings which modify constants

These aren't correct and will need updating

* Add regression test

* riscv64: Fix handling unsigned comparisons with constants

---------

Co-authored-by: Afonso Bordado <afonsobordado@az8.co>
@alexcrichton alexcrichton deleted the rv64-branches branch October 25, 2023 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift Issues related to the Cranelift code generator isle Related to the ISLE domain-specific language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants