-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
riscv64: Improve signed and zero extend codegen #5844
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the extent that I can check it without looking up the specific instructions here, I believe this is correct. I like how much this simplifies the ISLE rules while producing better code in more cases.
I assume this needs rebasing after #5847, and I have a couple suggestions about the ISLE rules for your consideration. If you haven't already, some local fuzzing would be great too. But feel free to merge at your discretion!
(decl extend_shift_op (ExtendOp) AluOPRRI) | ||
(rule (extend_shift_op (ExtendOp.Zero)) (AluOPRRI.Srli)) | ||
(rule (extend_shift_op (ExtendOp.Signed)) (AluOPRRI.Srai)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting for completeness: This term could safely be made (decl pure ...)
. It's not necessary though since the constructor is only used on the RHS of an impure term. It's also a nice motivating example for #5771.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having something like a match would be really nice! Or at least being able to scope these rules so that they don't accidentally get used by other rules.
;; Extend the bottom register to I64 and then just zero out the top half. | ||
(rule 9 (extend val (ExtendOp.Zero) (fits_in_32 from_ty) $I128) | ||
(let ((val Reg (value_regs_get val 0)) | ||
(extended Reg (extend val (ExtendOp.Zero) from_ty $I64))) | ||
(value_regs extended (load_u64_constant 0)))) | ||
|
||
(rule 10 (extend val (ExtendOp.Zero) $I64 $I128) | ||
(value_regs | ||
(value_regs_get val 0) | ||
(load_u64_constant 0))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A thought: rule 9
here could use fits_in_64
and still be correct in isolation. That has the advantage of matching the structure of rule 8
(the signed case), which is nice for reasoning; but the disadvantage that it does a recursive call to extend
just to discover that both type arguments are the same and return the input unchanged.
If you did make that change though you could delete rule 10
, because rule 9
would produce the same result without the special case. I think that would be slightly nicer.
Also, rules 8 and 9 can share the same priority since they differ on the ExtendOp
argument. They do currently need a different priority than all the other rules though since ISLE can't yet figure out that $I128
doesn't overlap with fits_in_64
. I want to fix that sooner or later...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that ended up being a bit cleaner! I like having the signed and unsigned rules being "visually" equal. Thanks!
They do currently need a different priority than all the other rules though since ISLE can't yet figure out that $I128 doesn't overlap with fits_in_64. I want to fix that sooner or later...
Oh, I think that might be why I started thinking that ISLE wouldn't reason past fits_in_64
, good to know that its only in I128's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not quite it. The issue is that ISLE doesn't know anything about when an extractor like fits_in_64
will match. If you have two patterns where one is $I64
and the other is $I32
, it can see that those are different. If the patterns are both fits_in_64
, it knows that both patterns will produce the same result on the same input so it can check deeper in the tree of patterns. But if the patterns are fits_in_64
versus fits_in_32
, or versus $I16
, ISLE can't draw any conclusions about whether those overlap or not.
I want to fix this by making ISLE understand or-patterns, then defining fits_in_64
as the equivalent of $I8 | $I16 | ...
. By making more semantics visible to ISLE, we'll get better pattern-matching code out as well as more precise overlap checks. Someday...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, makes sense, Thanks for explaining that!
;; If we are zero extending a U8 we can use a `andi` instruction. | ||
(rule 1 (extend val (ExtendOp.Zero) $I8 (fits_in_64 to_ty)) | ||
(let ((val Reg (value_regs_get val 0))) | ||
(alu_rr_imm12 (AluOPRRI.Andi) val (imm12_const 255)))) | ||
|
||
;; When signed extending from 32 to 64 bits we can use a | ||
;; `addiw val 0`. Also known as a `sext.w` | ||
(rule 2 (extend val (ExtendOp.Signed) $I32 $I64) | ||
(let ((val Reg (value_regs_get val 0))) | ||
(alu_rr_imm12 (AluOPRRI.Addiw) val (imm12_const 0)))) | ||
|
||
|
||
;; No point in trying to use `packh` here to zero extend 8 bit values | ||
;; since we can just use `andi` instead which is part of the base ISA. | ||
|
||
;; If we have the `zbkb` extension `packw` can be used to zero extend 16 bit values | ||
(rule 3 (extend val (ExtendOp.Zero) $I16 (fits_in_64 _)) | ||
(if-let $true (has_zbkb)) | ||
(let ((val Reg (value_regs_get val 0))) | ||
(alu_rrr (AluOPRRR.Packw) val (zero_reg)))) | ||
|
||
;; If we have the `zbkb` extension `pack` can be used to zero extend 32 bit registers | ||
(rule 4 (extend val (ExtendOp.Zero) $I32 $I64) | ||
(if-let $true (has_zbkb)) | ||
(let ((val Reg (value_regs_get val 0))) | ||
(alu_rrr (AluOPRRR.Pack) val (zero_reg)))) | ||
|
||
;;;; for I128 unsigned extend. | ||
(rule 1 | ||
(lower_extend r $false 64 128) | ||
(value_regs (gen_move2 r $I64 $I64) (load_u64_constant 0))) | ||
|
||
(rule | ||
(lower_extend r $false from_bits 128) | ||
(value_regs (gen_extend r $false from_bits 64) (load_u64_constant 0))) | ||
;; If we have the `zbb` extension we can use the dedicated `sext.b` instruction. | ||
(rule 5 (extend val (ExtendOp.Signed) $I8 (fits_in_64 _)) | ||
(if-let $true (has_zbb)) | ||
(let ((val Reg (value_regs_get val 0))) | ||
(alu_rr_imm12 (AluOPRRI.Sextb) val (imm12_const 0)))) | ||
|
||
;; If we have the `zbb` extension we can use the dedicated `sext.h` instruction. | ||
(rule 6 (extend val (ExtendOp.Signed) $I16 (fits_in_64 _)) | ||
(if-let $true (has_zbb)) | ||
(let ((val Reg (value_regs_get val 0))) | ||
(alu_rr_imm12 (AluOPRRI.Sexth) val (imm12_const 0)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think any of these rules overlap with each other, so it should be safe to place them all at the same priority. If I'm wrong, ISLE will tell you so 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that worked! I had the idea that isle wasn't able to reason very well past things like fits_in_64
and would start throwing overlap errors. I guess I was wrong!
ad8828d
to
16290ae
Compare
I've updated this, I'm going to run it in the fuzzer today. That being said I can only fuzz the Base & Edit: Got it to work on qemu. Annoyingly its finding issues in other lowerings when extensions are enabled. |
This has been fuzzing for a few hours now and it didn't find anything related to this. It did find some missing lowerings in |
👋 Hey,
This PR improves codegen for signed and zero extensions on the riscv64 backend.
The current status is that we have two lowerings. Either we do a
shl + shr
, or we do anandi, reg, 255
for zero exending I8's.This PR moves the extension logic into ISLE and adds a few more optimizations.
For the base instruction set we have the following:
shl + shr
as a generic fallbackandi, reg, 255
for zero extendingI8
'saddiw, reg, 0
for sign extendingI32
's intoI64
's (this is also known assext.w
)With the
zbb
extension we get a few more lowerings:sext.b
for sign extendingI8
'ssext.h
for sign extendingI16
'szext.h
for zero extendingI16
'sAdditionally we also use
zbkb
for the remaining cases:packw
for zero extendingI16
'spack
for zero extendingI32
'sWith the right extensions we are now guaranteed to have a single instruction extension for <= 64 bits and at most two for I128, which is nice.
I've also cleaned up the I128 codegen cases a little bit.
Unfortunately I can't remove the
Inst::Extend
since its used in some ABI code and is part of the atomic lowerings.I don't think we have a way to call into the ISLE lowerings in these cases right?
Notes about the test diffs:
Capstone does not recognize most of the
zbb
orzbkb
instructions.Inst::Extend
used to report for examplesext.b
, but actually always lowers asshl + shr
, which makes it seem that in some cases we are lowering more instructions. This is not the case, its just that we were printing the wrong instruction.The capstone disassembly diff is the correct one in these cases.
zbkb
is available from qemu 7.1.0 onwards. Although the testing situation here is quite tricky since we don't have a solid way of detecting these extensions automatically.I've tested these manually by force enabling the extensions and running the runtests.