Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x64: Peephole optimization for x < 0 #4625

Merged
merged 4 commits into from
Aug 9, 2022

Conversation

elliottt
Copy link
Member

@elliottt elliottt commented Aug 5, 2022

Fixes #4607

Add a peephole optimization for icmp slt x, (iconst 0) for 32 and 64 bit values, turning them into a right shift by 31 or 63 bits respectively. This optimization only applies to the case where the result of the icmp is used directly, cases like brz (icmp ..) will be left alone.

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Two followup thoughts:

  1. I don't think the Wasm frontend uses B1 at all, even for temporaries, though I could be wrong. If not, could we do the same for whatever ops it generates for the Wasm compare opcodes that produce an i32 0/1 flag?
  2. I think we can get the opposite case (x >= 0) as well with a negation; that might be worth doing?

@cfallin
Copy link
Member

cfallin commented Aug 5, 2022

(If you want to merge as-is that's fine too; just writing down some ideas that came to mind!)

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen labels Aug 5, 2022
@@ -99,9 +99,108 @@ block0(v0: i32):
; pushq %rbp
; movq %rsp, %rbp
; block0:
; shrl $31, %edi, %edi
; sarl $31, %edi, %edi
Copy link
Contributor

@MaxGraey MaxGraey Aug 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why it uses signed right shift now? It should be an unsigned shift. Or -1 / 0 is equivalent to true / false. Sorry for the possibly simple questions. I'm not familiar with the generally accepted rules of codegen specifically for cranelift

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was with an eye towards better supporting our boolean representations, as mentioned in #3205, as sar will do sign extension.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that it's not clear to me that we can use the wider boolean types with icmp, I've switched back to shr with a type guard on the result. Sorry about the back and forth here!

Comment on lines 167 to 168
; sarl $31, %edi, %edi
; notl %edi, %edi
Copy link
Contributor

@MaxGraey MaxGraey Aug 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct? I guess it should be:

notl    %edi, %edi
shrl    $31, %edi, %edi

https://godbolt.org/z/s1fccc5Ex

However if codegen uses -1 / 0 instead 1 / 0 for true / false. It should be ok.

Also I'm wondering how ordering not / shr may affect on micro/macro/instruction fusion on x64

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've inverted it and restricted this optimization to boolean-producing instructions. Also this shouldn't affect the test/jmp fusion, as this optimization will only fire for icmp instructions whose results are used directly, not as the input to a branch decision. Was there another fusion case you were thinking this might invalidate?

Comment on lines 1508 to 1510
;; Peephole optimization for `0 > x`, when x is a signed 64 bit value
(rule (lower (icmp (IntCC.SignedGreaterThan) (u64_from_iconst 0) x @ (value_type $I64)))
(x64_sar $I64 x (Imm8Reg.Imm8 63)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't ISLE and cranelift ir always canonicalize constants on the right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not currently, hence the symmetric cases for this optimization. This is something we might look into after the mid-end optimization work has landed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks!

@elliottt elliottt marked this pull request as ready for review August 8, 2022 19:48
@elliottt elliottt requested a review from cfallin August 8, 2022 19:49
Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@elliottt elliottt merged commit ed7dfd3 into bytecodealliance:main Aug 9, 2022
@elliottt elliottt deleted the trevor/x64-icmp-bool-opt branch August 9, 2022 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cranelift: Add x < 0 -> x >>> bit_size(ty) - 1 lowering rule
3 participants