Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

riscv64: Implement insertlane #6408

Merged
merged 9 commits into from
May 20, 2023

Conversation

afonso360
Copy link
Contributor

@afonso360 afonso360 commented May 18, 2023

👋 Hey,

This PR implements insertlane. The implementation is essentially the same as the splat instruction, but using the masked version of vmv where only one lane is enabled. Masked vmv's are called vmerge, and they are only special in the fact that they don't follow the mask policy for masked out elements. Instead, masked out elements are sourced from one of the operands, instead of being ignored.

Almost all instructions in the V spec support a mask. The mask is determined by a single bit in the instruction encoding, and when enabled always sources register v0 as a mask. Masks are represented as bitfields where each bit corresponds to one lane.

The WASM replacelane tests exposed a preexisting bug in the encoding of vmv.s.x and vfmv.s.f, where their source operands are encoded in the vs1 field, instead of vs2. A fix for that is also included in this PR.

@afonso360 afonso360 requested review from a team as code owners May 18, 2023 11:17
@afonso360 afonso360 requested review from elliottt and removed request for a team May 18, 2023 11:17
@github-actions github-actions bot added the cranelift Issues related to the Cranelift code generator label May 18, 2023
cranelift/codegen/src/isa/riscv64/inst/mod.rs Outdated Show resolved Hide resolved
Comment on lines +450 to +533
(rule (gen_vec_mask mask)
(rv_vmv_sx (imm $I64 mask) (vstate_from_type $I64X2)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about RISC-V vector instructions, but is this the most optimal this can be? For example are there instructions which move immediates into registers if the immediate is small?

I ask because it seems sort of out of place abstractly to have these sorts of operations when dealing with the immediates of insertlane/extractlane, but hey if it's the way it is then that's how it's gonna be.

One thing I thought though is that this might be good to delegate to a general vconst helper. Right now vconst always performs a load but it might be reasonable to fold a rule like this where if the immediate fits in an i64 it could do the x->v register movement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about RISC-V vector instructions, but is this the most optimal this can be? For example are there instructions which move immediates into registers if the immediate is small?

I'm not sure!

There are a few combos here that I can think of that could potentially be better than this in terms of materializing the mask. One of them like you mentioned is using something like the immediate splat instruction vmv.v.i, but that only covers a weird subset of masks. Where the mask result can fit in a signed imm5 per lane. So with that one we can do a mask like 0b001111 but not 0b011111, though we do get the benefit of it being just a single instruction.

I didn't want to try to pull off those sort of weird edge cases yet, but it might be easier than I think if we can look at the whole thing through the lens of vconst, I'm going to give that a try.

I ask because it seems sort of out of place abstractly to have these sorts of operations when dealing with the immediates of insertlane/extractlane, but hey if it's the way it is then that's how it's gonna be.

Do you mean having a separate move instruction into the X / F registers that has no immediate? We used to have instructions that matched exactly those semantics, but they were removed with the following reasoning:

NOTE: The complementary vins.v.x instruction, which allow a write to
any element in a vector register, has been removed. This instruction
would be the only instruction (apart from vsetvl) that requires two
integer source operands, and also would be slow to execute in an
implementation with vector register renaming, relegating its main use
to debugger modifications to state. The alternative and more
generally useful vslide1up and vslide1down instructions can be
used to update vector register state in place over a debug link
without accessing memory.

vslide1up and vslide1down aren't exactly applicable to insertlane and extractlane. And they have a special note in the standard stating that they are slow instructions, pretty much just reserved for debuggers.

One thing I thought though is that this might be good to delegate to a general vconst helper. Right now vconst always performs a load but it might be reasonable to fold a rule like this where if the immediate fits in an i64 it could do the x->v register movement.

Yeah, that seems like a good idea! I'm also going to try to merge the splat const rules into that.


On a broader note I checked what the other engines are doing here w.r.t insertlane:

V8 seems to be doing exactly what we do.

LLVM does a vslideup with a tail undisturbed policy, which is also an option and it does mean they can avoid materializing a mask since they can use vslideup.vi with the lane as part of the immediate. I have no idea if this is better or worse than the option above since it also emits 2 or 3 vsetvli's.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, and thanks for explaining all that! To be clear I don't think any changes are necessary for this PR, I was mostly just musing. With prior art in V8 this is definitely fine to land (even without that I think it's ok). I agree it's probably not worth it yet to have lots of special cases for various immediates here and there, better to start simple and build out. I was mostly curious if there was a pattern that was going to be implemented later that would simplify this or whether this would be the way forward for awhile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer handling the merging all of the vconst stuff in a future PR. But yeah, making mask generation as part of that makes a ton of sense!

@afonso360 afonso360 enabled auto-merge May 20, 2023 10:33
@afonso360 afonso360 added this pull request to the merge queue May 20, 2023
Merged via the queue into bytecodealliance:main with commit 9871098 May 20, 2023
@afonso360 afonso360 deleted the riscv-insertlane branch May 20, 2023 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants