Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cranelift: fix branch-of-icmp/fcmp regression: look through uextend. #5487

Merged

Conversation

cfallin
Copy link
Member

@cfallin cfallin commented Dec 22, 2022

In #5031, we removed bool types from CLIF, using integers instead for "truthy" values. This greatly simplified the IR, and was generally an improvement.

However, because x86's SETcc instruction sets only the low 8 bits of a register, we chose to use i8 types as the result of icmp and fcmp, to avoid the need for a masking operation when materializing the result.

Unfortunately this means that uses of truthy values often now have uextend operations, especially when coming from Wasm (where truthy values are naturally i32-typed). For example, where we previously had (brz (icmp ...)), we now have (brz (uextend (icmp ...))).

It's arguable whether or not we should switch to i32 truthy values -- in most cases we can avoid materializing a value that's immediately used for a branch or select, so a mask would in most cases be unnecessary, and it would be a win at the IR level -- but irrespective of that, this change did regress our generated code quality: our backends had patterns for e.g. (brz (icmp ...)) but not with the uextend, so we were always materializing truthy values. Many blocks thus ended with "cmp; setcc; cmp; test; branch" rather than "cmp; branch".

In #5391 we noticed this and fixed it on x64, but it was a general problem on aarch64 and riscv64 as well. This PR introduces a maybe_uextend extractor that "looks through" uextends, and uses it where we consume truthy values, thus fixing the regression. This PR also adds compile filetests to ensure we don't regress again.

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen labels Dec 22, 2022
Copy link
Member

@elliottt elliottt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix looks good to me! It's unfortunate that we can't implement maybe_uextend in ISLE currently.

Separately, it seems very reasonable to switch to having comparisons produce an i32 result, given that our primary consumer is wasm. As you mentioned, we can avoid materializing it in many cases, and it would cut out some unnecessary uextend instructions when we're compiling wasm.

In bytecodealliance#5031, we removed `bool` types from CLIF, using integers instead for
"truthy" values. This greatly simplified the IR, and was generally an
improvement.

However, because x86's `SETcc` instruction sets only the low 8 bits of a
register, we chose to use `i8` types as the result of `icmp` and `fcmp`,
to avoid the need for a masking operation when materializing the result.

Unfortunately this means that uses of truthy values often now have
`uextend` operations, especially when coming from Wasm (where truthy
values are naturally `i32`-typed). For example, where we previously had
`(brz (icmp ...))`, we now have `(brz (uextend (icmp ...)))`.

It's arguable whether or not we should switch to `i32` truthy values --
in most cases we can avoid materializing a value that's immediately used
for a branch or select, so a mask would in most cases be unnecessary,
and it would be a win at the IR level -- but irrespective of that, this
change *did* regress our generated code quality: our backends had
patterns for e.g. `(brz (icmp ...))` but not with the `uextend`, so we
were *always* materializing truthy values. Many blocks thus ended with
"cmp; setcc; cmp; test; branch" rather than "cmp; branch".

In bytecodealliance#5391 we noticed this and fixed it on x64, but it was a general
problem on aarch64 and riscv64 as well. This PR introduces a
`maybe_uextend` extractor that "looks through" uextends, and uses it
where we consume truthy values, thus fixing the regression.  This PR
also adds compile filetests to ensure we don't regress again.

The riscv64 backend has not been updated here because doing so appears
to trigger another issue in its branch handling; fixing that is TBD.
@cfallin cfallin force-pushed the fix-branch-of-uextend-of-icmp branch from 4622d4c to 6cde468 Compare December 22, 2022 08:57
@cfallin
Copy link
Member Author

cfallin commented Dec 22, 2022

Reverted the riscv64 changes as there seems to be a deeper issue uncovered (branch in the middle of a basic block?). Someone can look at that separately but for now I'll merge the remainder of the PR so we have the aarch64 fixes in.

@cfallin cfallin merged commit 0346345 into bytecodealliance:main Dec 22, 2022
@cfallin cfallin deleted the fix-branch-of-uextend-of-icmp branch December 22, 2022 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants