Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

riscv64: Optimize storing zero to memory by using zero_reg #8719

Merged

Conversation

alexcrichton
Copy link
Member

This commit is similar to #8701 in that it adds a special case to store operations to use the zero register when applicable.

@alexcrichton alexcrichton requested a review from a team as a code owner May 31, 2024 14:42
@alexcrichton alexcrichton requested review from abrown and removed request for a team May 31, 2024 14:42
Copy link
Contributor

@afonso360 afonso360 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat!

I also have a suspicion that the CI failures might be that we are now trying to encode a compressed store with the zero register. I don't remember if this is legal, but I'm going to investigate it a bit.

Edit: Yeah, the RISC-V Manual (Page 70) states:

For many RVC instructions, zero-valued immediates are disallowed and x0 is not a valid 5-bit register specifier. These restrictions free up encoding space for other instructions requiring fewer operand bits.

There is nothing specific about the store instructions falling under these restrictions and It's probably why we don't have the check here. This still allows us to use the zero register for floating point compressed stores though!

Note the regular stores (non-stack based) should not need an update since they use a compressed register which automatically excludes the zero register.

@alexcrichton
Copy link
Member Author

At least one issue I tracked to my misunderstanding of how istore{8,16,32} worked. I was inferring the type of store from the operand but that's not correct for those instructions. I'll take a look at the compressed bits now as well.

This commit is similar to bytecodealliance#8701 in that it adds a special case to
`store` operations to use the `zero` register when applicable.
@alexcrichton alexcrichton force-pushed the riscv64-store-zero-optimization branch from aef60c2 to 17ad1f7 Compare May 31, 2024 17:06
@alexcrichton
Copy link
Member Author

Ok I did a bit of an audit of the compressed stores. I added a new test in the zcb.clif test doing all the store widths to both an arbitrary register and the stack. My bugfix around my misinterpretation of istore8 and friends fixed CI, so there was no "smoking gun" per se to the compressed loads.

Compressed loads to arbitrary registers already don't work since that gates on the source being x8-x15 (as you pointed out). We are emitting compressed loads to the stack for x0 though but at least capstone disassembles it so it seems like it might be intended to work? Do you think I should add a condition to avoid generating the instruction to be safe though to ensure that it doesn't get emitted?

This still allows us to use the zero register for floating point compressed stores though!

By this do you mean that we could use a compressed sw instruction for a f32 store for example? I haven't done that yet because it would involve changing the type of the store to recognize that which is somewhat orthogonal to this. That being said I actually discovered a bug as well adding the f32/f64 case to the test here. Turns out i64_from_iconst will match f32const and f64const nodes, so I switched instead to u64_from_iconst which actually only matches iconst nodes.

@afonso360
Copy link
Contributor

My bugfix around my misinterpretation of istore8 and friends fixed CI, so there was no "smoking gun" per se to the compressed loads.

Nice! Good to know.

We are emitting compressed loads to the stack for x0 though but at least capstone disassembles it so it seems like it might be intended to work?

For loads we shouldn't be, at least we have that specific check in the emit code

For stores I would agree, If capstone disassembles it it's probably intended to work. It also seems like a very useful instruction to have, so I wouldn't be surprised that it is intended.

Do you think I should add a condition to avoid generating the instruction to be safe though to ensure that it doesn't get emitted?

I think this is fine as is. I'm fairly sure that if it wasn't intended the testsuites would fail in a bunch of places.

By this do you mean that we could use a compressed sw instruction for a f32 store for example?

No, just that the zero_reg in the emit for Fsd (float store double), means f0 and that is a perfectly normal register, but since we are not restricting the zero reg, it doesn't matter.

Turns out i64_from_iconst will match f32const and f64const nodes

That's super weird, and I would not expect that at all.

@afonso360 afonso360 added this pull request to the merge queue May 31, 2024
Merged via the queue into bytecodealliance:main with commit 271a232 May 31, 2024
36 checks passed
@alexcrichton alexcrichton deleted the riscv64-store-zero-optimization branch May 31, 2024 18:25
@alexcrichton
Copy link
Member Author

For loads we shouldn't be, at least we have that specific check in the emit code

Oops sorry I misspoke, I meant stores!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants