Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

riscv64: Improve signed and zero extend codegen #5844

Merged
merged 14 commits into from
Feb 22, 2023

Conversation

afonso360
Copy link
Contributor

@afonso360 afonso360 commented Feb 20, 2023

👋 Hey,

This PR improves codegen for signed and zero extensions on the riscv64 backend.

The current status is that we have two lowerings. Either we do a shl + shr, or we do an andi, reg, 255 for zero exending I8's.

This PR moves the extension logic into ISLE and adds a few more optimizations.

For the base instruction set we have the following:

  • shl + shr as a generic fallback
  • andi, reg, 255 for zero extending I8's
  • addiw, reg, 0 for sign extending I32's into I64's (this is also known as sext.w)

With the zbb extension we get a few more lowerings:

  • sext.b for sign extending I8's
  • sext.h for sign extending I16's
  • zext.h for zero extending I16's

Additionally we also use zbkb for the remaining cases:

  • packw for zero extending I16's
  • pack for zero extending I32's

With the right extensions we are now guaranteed to have a single instruction extension for <= 64 bits and at most two for I128, which is nice.

I've also cleaned up the I128 codegen cases a little bit.


Unfortunately I can't remove the Inst::Extend since its used in some ABI code and is part of the atomic lowerings.

I don't think we have a way to call into the ISLE lowerings in these cases right?


Notes about the test diffs:

Capstone does not recognize most of the zbb or zbkb instructions.

Inst::Extend used to report for example sext.b, but actually always lowers as shl + shr, which makes it seem that in some cases we are lowering more instructions. This is not the case, its just that we were printing the wrong instruction.

The capstone disassembly diff is the correct one in these cases.


zbkb is available from qemu 7.1.0 onwards. Although the testing situation here is quite tricky since we don't have a solid way of detecting these extensions automatically.

I've tested these manually by force enabling the extensions and running the runtests.

@github-actions github-actions bot added the cranelift Issues related to the Cranelift code generator label Feb 20, 2023
Copy link
Contributor

@jameysharp jameysharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the extent that I can check it without looking up the specific instructions here, I believe this is correct. I like how much this simplifies the ISLE rules while producing better code in more cases.

I assume this needs rebasing after #5847, and I have a couple suggestions about the ISLE rules for your consideration. If you haven't already, some local fuzzing would be great too. But feel free to merge at your discretion!

Comment on lines 1021 to 1055
(decl extend_shift_op (ExtendOp) AluOPRRI)
(rule (extend_shift_op (ExtendOp.Zero)) (AluOPRRI.Srli))
(rule (extend_shift_op (ExtendOp.Signed)) (AluOPRRI.Srai))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting for completeness: This term could safely be made (decl pure ...). It's not necessary though since the constructor is only used on the RHS of an impure term. It's also a nice motivating example for #5771.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having something like a match would be really nice! Or at least being able to scope these rules so that they don't accidentally get used by other rules.

Comment on lines 1091 to 1100
;; Extend the bottom register to I64 and then just zero out the top half.
(rule 9 (extend val (ExtendOp.Zero) (fits_in_32 from_ty) $I128)
(let ((val Reg (value_regs_get val 0))
(extended Reg (extend val (ExtendOp.Zero) from_ty $I64)))
(value_regs extended (load_u64_constant 0))))

(rule 10 (extend val (ExtendOp.Zero) $I64 $I128)
(value_regs
(value_regs_get val 0)
(load_u64_constant 0)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thought: rule 9 here could use fits_in_64 and still be correct in isolation. That has the advantage of matching the structure of rule 8 (the signed case), which is nice for reasoning; but the disadvantage that it does a recursive call to extend just to discover that both type arguments are the same and return the input unchanged.

If you did make that change though you could delete rule 10, because rule 9 would produce the same result without the special case. I think that would be slightly nicer.

Also, rules 8 and 9 can share the same priority since they differ on the ExtendOp argument. They do currently need a different priority than all the other rules though since ISLE can't yet figure out that $I128 doesn't overlap with fits_in_64. I want to fix that sooner or later...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that ended up being a bit cleaner! I like having the signed and unsigned rules being "visually" equal. Thanks!

They do currently need a different priority than all the other rules though since ISLE can't yet figure out that $I128 doesn't overlap with fits_in_64. I want to fix that sooner or later...

Oh, I think that might be why I started thinking that ISLE wouldn't reason past fits_in_64, good to know that its only in I128's.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not quite it. The issue is that ISLE doesn't know anything about when an extractor like fits_in_64 will match. If you have two patterns where one is $I64 and the other is $I32, it can see that those are different. If the patterns are both fits_in_64, it knows that both patterns will produce the same result on the same input so it can check deeper in the tree of patterns. But if the patterns are fits_in_64 versus fits_in_32, or versus $I16, ISLE can't draw any conclusions about whether those overlap or not.

I want to fix this by making ISLE understand or-patterns, then defining fits_in_64 as the equivalent of $I8 | $I16 | .... By making more semantics visible to ISLE, we'll get better pattern-matching code out as well as more precise overlap checks. Someday...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense, Thanks for explaining that!

Comment on lines 1035 to 1105
;; If we are zero extending a U8 we can use a `andi` instruction.
(rule 1 (extend val (ExtendOp.Zero) $I8 (fits_in_64 to_ty))
(let ((val Reg (value_regs_get val 0)))
(alu_rr_imm12 (AluOPRRI.Andi) val (imm12_const 255))))

;; When signed extending from 32 to 64 bits we can use a
;; `addiw val 0`. Also known as a `sext.w`
(rule 2 (extend val (ExtendOp.Signed) $I32 $I64)
(let ((val Reg (value_regs_get val 0)))
(alu_rr_imm12 (AluOPRRI.Addiw) val (imm12_const 0))))


;; No point in trying to use `packh` here to zero extend 8 bit values
;; since we can just use `andi` instead which is part of the base ISA.

;; If we have the `zbkb` extension `packw` can be used to zero extend 16 bit values
(rule 3 (extend val (ExtendOp.Zero) $I16 (fits_in_64 _))
(if-let $true (has_zbkb))
(let ((val Reg (value_regs_get val 0)))
(alu_rrr (AluOPRRR.Packw) val (zero_reg))))

;; If we have the `zbkb` extension `pack` can be used to zero extend 32 bit registers
(rule 4 (extend val (ExtendOp.Zero) $I32 $I64)
(if-let $true (has_zbkb))
(let ((val Reg (value_regs_get val 0)))
(alu_rrr (AluOPRRR.Pack) val (zero_reg))))

;;;; for I128 unsigned extend.
(rule 1
(lower_extend r $false 64 128)
(value_regs (gen_move2 r $I64 $I64) (load_u64_constant 0)))

(rule
(lower_extend r $false from_bits 128)
(value_regs (gen_extend r $false from_bits 64) (load_u64_constant 0)))
;; If we have the `zbb` extension we can use the dedicated `sext.b` instruction.
(rule 5 (extend val (ExtendOp.Signed) $I8 (fits_in_64 _))
(if-let $true (has_zbb))
(let ((val Reg (value_regs_get val 0)))
(alu_rr_imm12 (AluOPRRI.Sextb) val (imm12_const 0))))

;; If we have the `zbb` extension we can use the dedicated `sext.h` instruction.
(rule 6 (extend val (ExtendOp.Signed) $I16 (fits_in_64 _))
(if-let $true (has_zbb))
(let ((val Reg (value_regs_get val 0)))
(alu_rr_imm12 (AluOPRRI.Sexth) val (imm12_const 0))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any of these rules overlap with each other, so it should be safe to place them all at the same priority. If I'm wrong, ISLE will tell you so 😁

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that worked! I had the idea that isle wasn't able to reason very well past things like fits_in_64 and would start throwing overlap errors. I guess I was wrong!

@afonso360
Copy link
Contributor Author

afonso360 commented Feb 22, 2023

I've updated this, I'm going to run it in the fuzzer today. That being said I can only fuzz the Base & Zbb rules, since my machine does not have Zbkb. I also haven't been able to setup cargo-fuzz to run with qemu. I'll need to give that another try.

Edit: Got it to work on qemu. Annoyingly its finding issues in other lowerings when extensions are enabled.

@afonso360
Copy link
Contributor Author

This has been fuzzing for a few hours now and it didn't find anything related to this. It did find some missing lowerings in ctz when Zbb is enabled, but I'll post a fix for those on a separate PR.

@afonso360 afonso360 added this pull request to the merge queue Feb 22, 2023
Merged via the queue into bytecodealliance:main with commit f6c6bc2 Feb 22, 2023
@afonso360 afonso360 deleted the riscv-extend branch February 22, 2023 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants