-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AArch64: conform to AArch64 calling convention's half-caller/half-callee-save vector register conventions. #2254
Labels
cranelift:area:aarch64
Issues related to AArch64 backend.
Comments
cfallin
added a commit
to cfallin/wasmtime
that referenced
this issue
Oct 6, 2020
This PR updates the AArch64 ABI implementation so that it (i) properly respects that v8-v15 inclusive have callee-save lower halves, and caller-save upper halves, by conservatively approximating (to full registers) in the appropriate directions when generating prologue caller-saves and when informing the regalloc of clobbered regs across callsites. In order to prevent saving all of these vector registers in the prologue of every non-leaf function due to the above approximation, this also makes use of a new regalloc.rs feature to exclude call instructions' writes from the clobber set returned by register allocation. This is safe whenever the caller and callee have the same ABI (because anything the callee could clobber, the caller is allowed to clobber as well without saving it in the prologue). Requires bytecodealliance/regalloc.rs#100; this PR refers to my fork of regalloc.rs until that PR is reviewed, merged and included in a new release. Fixes bytecodealliance#2254.
cfallin
added a commit
to cfallin/wasmtime
that referenced
this issue
Oct 6, 2020
This PR updates the AArch64 ABI implementation so that it (i) properly respects that v8-v15 inclusive have callee-save lower halves, and caller-save upper halves, by conservatively approximating (to full registers) in the appropriate directions when generating prologue caller-saves and when informing the regalloc of clobbered regs across callsites. In order to prevent saving all of these vector registers in the prologue of every non-leaf function due to the above approximation, this also makes use of a new regalloc.rs feature to exclude call instructions' writes from the clobber set returned by register allocation. This is safe whenever the caller and callee have the same ABI (because anything the callee could clobber, the caller is allowed to clobber as well without saving it in the prologue). Requires bytecodealliance/regalloc.rs#100; this PR refers to my fork of regalloc.rs until that PR is reviewed, merged and included in a new release. Fixes bytecodealliance#2254.
cfallin
added a commit
to cfallin/wasmtime
that referenced
this issue
Oct 6, 2020
This PR updates the AArch64 ABI implementation so that it (i) properly respects that v8-v15 inclusive have callee-save lower halves, and caller-save upper halves, by conservatively approximating (to full registers) in the appropriate directions when generating prologue caller-saves and when informing the regalloc of clobbered regs across callsites. In order to prevent saving all of these vector registers in the prologue of every non-leaf function due to the above approximation, this also makes use of a new regalloc.rs feature to exclude call instructions' writes from the clobber set returned by register allocation. This is safe whenever the caller and callee have the same ABI (because anything the callee could clobber, the caller is allowed to clobber as well without saving it in the prologue). Fixes bytecodealliance#2254.
cfallin
added a commit
that referenced
this issue
Nov 30, 2020
This PR updates the AArch64 ABI implementation so that it (i) properly respects that v8-v15 inclusive have callee-save lower halves, and caller-save upper halves, by conservatively approximating (to full registers) in the appropriate directions when generating prologue caller-saves and when informing the regalloc of clobbered regs across callsites. In order to prevent saving all of these vector registers in the prologue of every non-leaf function due to the above approximation, this also makes use of a new regalloc.rs feature to exclude call instructions' writes from the clobber set returned by register allocation. This is safe whenever the caller and callee have the same ABI (because anything the callee could clobber, the caller is allowed to clobber as well without saving it in the prologue). Fixes #2254.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
It arose in #2228 that our current implementation of the standard (SysV / AAPCS) calling convention on AArch64 is not quite right. In the real calling convention, the lower 64-bit halves of some vector registers are callee-save, while the upper 64-bit halves are caller-save. We currently treat them as fully callee-save. This is correct (if conservative) w.r.t. clobber-saves in prologue/epilogue code, but is incorrect w.r.t. callsites.
The most correct fix is to support overlapping register classes in regalloc.rs (see bytecodealliance/regalloc.rs#98). However, in #2228, a simpler stopgap fix was proposed. This fix also requires a small modification to regalloc.rs, but simply to omit call instructions' defs/mods from the clobbered-register-set computation. If the caller and callee have the same ABI, then this is safe.
If the full fix in regalloc.rs is not an easy change (it appears it will not be, as aliasing registers will interact in a fundamental way with the core allocation data structures) then IMHO we should implement the stopgap fix for now, so that when we call into library or other code, we don't have subtle correctness issues with half-clobbered vector values.
cc @julian-seward1, @bnjbvr, @akirilov-arm
The text was updated successfully, but these errors were encountered: