riscv64: Optimize storing zero to memory by using `zero_reg` #8719

alexcrichton · 2024-05-31T14:42:45Z

This commit is similar to #8701 in that it adds a special case to store operations to use the zero register when applicable.

afonso360

Neat!

I also have a suspicion that the CI failures might be that we are now trying to encode a compressed store with the zero register. I don't remember if this is legal, but I'm going to investigate it a bit.

Edit: Yeah, the RISC-V Manual (Page 70) states:

For many RVC instructions, zero-valued immediates are disallowed and x0 is not a valid 5-bit register specifier. These restrictions free up encoding space for other instructions requiring fewer operand bits.

There is nothing specific about the store instructions falling under these restrictions and It's probably why we don't have the check here. This still allows us to use the zero register for floating point compressed stores though!

Note the regular stores (non-stack based) should not need an update since they use a compressed register which automatically excludes the zero register.

alexcrichton · 2024-05-31T16:41:35Z

At least one issue I tracked to my misunderstanding of how istore{8,16,32} worked. I was inferring the type of store from the operand but that's not correct for those instructions. I'll take a look at the compressed bits now as well.

This commit is similar to bytecodealliance#8701 in that it adds a special case to `store` operations to use the `zero` register when applicable.

alexcrichton · 2024-05-31T17:10:10Z

Ok I did a bit of an audit of the compressed stores. I added a new test in the zcb.clif test doing all the store widths to both an arbitrary register and the stack. My bugfix around my misinterpretation of istore8 and friends fixed CI, so there was no "smoking gun" per se to the compressed loads.

Compressed loads to arbitrary registers already don't work since that gates on the source being x8-x15 (as you pointed out). We are emitting compressed loads to the stack for x0 though but at least capstone disassembles it so it seems like it might be intended to work? Do you think I should add a condition to avoid generating the instruction to be safe though to ensure that it doesn't get emitted?

This still allows us to use the zero register for floating point compressed stores though!

By this do you mean that we could use a compressed sw instruction for a f32 store for example? I haven't done that yet because it would involve changing the type of the store to recognize that which is somewhat orthogonal to this. That being said I actually discovered a bug as well adding the f32/f64 case to the test here. Turns out i64_from_iconst will match f32const and f64const nodes, so I switched instead to u64_from_iconst which actually only matches iconst nodes.

afonso360 · 2024-05-31T17:49:10Z

My bugfix around my misinterpretation of istore8 and friends fixed CI, so there was no "smoking gun" per se to the compressed loads.

Nice! Good to know.

We are emitting compressed loads to the stack for x0 though but at least capstone disassembles it so it seems like it might be intended to work?

For loads we shouldn't be, at least we have that specific check in the emit code

For stores I would agree, If capstone disassembles it it's probably intended to work. It also seems like a very useful instruction to have, so I wouldn't be surprised that it is intended.

Do you think I should add a condition to avoid generating the instruction to be safe though to ensure that it doesn't get emitted?

I think this is fine as is. I'm fairly sure that if it wasn't intended the testsuites would fail in a bunch of places.

By this do you mean that we could use a compressed sw instruction for a f32 store for example?

No, just that the zero_reg in the emit for Fsd (float store double), means f0 and that is a perfectly normal register, but since we are not restricting the zero reg, it doesn't matter.

Turns out i64_from_iconst will match f32const and f64const nodes

That's super weird, and I would not expect that at all.

alexcrichton · 2024-05-31T18:26:21Z

For loads we shouldn't be, at least we have that specific check in the emit code

Oops sorry I misspoke, I meant stores!

alexcrichton requested a review from a team as a code owner May 31, 2024 14:42

alexcrichton requested review from abrown and removed request for a team May 31, 2024 14:42

afonso360 approved these changes May 31, 2024

View reviewed changes

alexcrichton added 3 commits May 31, 2024 10:06

riscv64: Optimize storing zero to memory by using zero_reg

1606c88

This commit is similar to bytecodealliance#8701 in that it adds a special case to `store` operations to use the `zero` register when applicable.

Fix lowerings of istore{8,16,32}

371c30e

Add test for compressed stores

17ad1f7

alexcrichton force-pushed the riscv64-store-zero-optimization branch from aef60c2 to 17ad1f7 Compare May 31, 2024 17:06

afonso360 added this pull request to the merge queue May 31, 2024

Merged via the queue into bytecodealliance:main with commit 271a232 May 31, 2024
36 checks passed

alexcrichton deleted the riscv64-store-zero-optimization branch May 31, 2024 18:25

alexcrichton mentioned this pull request May 31, 2024

ISLE i64_from_iconst extractor can match f32const/f64const #8723

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

riscv64: Optimize storing zero to memory by using `zero_reg` #8719

riscv64: Optimize storing zero to memory by using `zero_reg` #8719

alexcrichton commented May 31, 2024

afonso360 left a comment •

edited

Loading

alexcrichton commented May 31, 2024

alexcrichton commented May 31, 2024

afonso360 commented May 31, 2024

alexcrichton commented May 31, 2024

riscv64: Optimize storing zero to memory by using zero_reg #8719

riscv64: Optimize storing zero to memory by using zero_reg #8719

Conversation

alexcrichton commented May 31, 2024

afonso360 left a comment • edited Loading

Choose a reason for hiding this comment

alexcrichton commented May 31, 2024

alexcrichton commented May 31, 2024

afonso360 commented May 31, 2024

alexcrichton commented May 31, 2024

riscv64: Optimize storing zero to memory by using `zero_reg` #8719

riscv64: Optimize storing zero to memory by using `zero_reg` #8719

afonso360 left a comment •

edited

Loading