Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AArch64: conform to AArch64 calling convention's half-caller/half-callee-save vector register conventions. #2254

Closed
cfallin opened this issue Oct 2, 2020 · 0 comments · Fixed by #2267
Labels
cranelift:area:aarch64 Issues related to AArch64 backend.

Comments

@cfallin
Copy link
Member

cfallin commented Oct 2, 2020

It arose in #2228 that our current implementation of the standard (SysV / AAPCS) calling convention on AArch64 is not quite right. In the real calling convention, the lower 64-bit halves of some vector registers are callee-save, while the upper 64-bit halves are caller-save. We currently treat them as fully callee-save. This is correct (if conservative) w.r.t. clobber-saves in prologue/epilogue code, but is incorrect w.r.t. callsites.

The most correct fix is to support overlapping register classes in regalloc.rs (see bytecodealliance/regalloc.rs#98). However, in #2228, a simpler stopgap fix was proposed. This fix also requires a small modification to regalloc.rs, but simply to omit call instructions' defs/mods from the clobbered-register-set computation. If the caller and callee have the same ABI, then this is safe.

If the full fix in regalloc.rs is not an easy change (it appears it will not be, as aliasing registers will interact in a fundamental way with the core allocation data structures) then IMHO we should implement the stopgap fix for now, so that when we call into library or other code, we don't have subtle correctness issues with half-clobbered vector values.

cc @julian-seward1, @bnjbvr, @akirilov-arm

@cfallin cfallin added the cranelift:area:aarch64 Issues related to AArch64 backend. label Oct 2, 2020
cfallin added a commit to cfallin/wasmtime that referenced this issue Oct 6, 2020
This PR updates the AArch64 ABI implementation so that it (i) properly
respects that v8-v15 inclusive have callee-save lower halves, and
caller-save upper halves, by conservatively approximating (to full
registers) in the appropriate directions when generating prologue
caller-saves and when informing the regalloc of clobbered regs across
callsites.

In order to prevent saving all of these vector registers in the prologue
of every non-leaf function due to the above approximation, this also
makes use of a new regalloc.rs feature to exclude call instructions'
writes from the clobber set returned by register allocation. This is
safe whenever the caller and callee have the same ABI (because anything
the callee could clobber, the caller is allowed to clobber as well
without saving it in the prologue).

Requires bytecodealliance/regalloc.rs#100; this PR refers to my fork of
regalloc.rs until that PR is reviewed, merged and included in a new
release.

Fixes bytecodealliance#2254.
cfallin added a commit to cfallin/wasmtime that referenced this issue Oct 6, 2020
This PR updates the AArch64 ABI implementation so that it (i) properly
respects that v8-v15 inclusive have callee-save lower halves, and
caller-save upper halves, by conservatively approximating (to full
registers) in the appropriate directions when generating prologue
caller-saves and when informing the regalloc of clobbered regs across
callsites.

In order to prevent saving all of these vector registers in the prologue
of every non-leaf function due to the above approximation, this also
makes use of a new regalloc.rs feature to exclude call instructions'
writes from the clobber set returned by register allocation. This is
safe whenever the caller and callee have the same ABI (because anything
the callee could clobber, the caller is allowed to clobber as well
without saving it in the prologue).

Requires bytecodealliance/regalloc.rs#100; this PR refers to my fork of
regalloc.rs until that PR is reviewed, merged and included in a new
release.

Fixes bytecodealliance#2254.
cfallin added a commit to cfallin/wasmtime that referenced this issue Oct 6, 2020
This PR updates the AArch64 ABI implementation so that it (i) properly
respects that v8-v15 inclusive have callee-save lower halves, and
caller-save upper halves, by conservatively approximating (to full
registers) in the appropriate directions when generating prologue
caller-saves and when informing the regalloc of clobbered regs across
callsites.

In order to prevent saving all of these vector registers in the prologue
of every non-leaf function due to the above approximation, this also
makes use of a new regalloc.rs feature to exclude call instructions'
writes from the clobber set returned by register allocation. This is
safe whenever the caller and callee have the same ABI (because anything
the callee could clobber, the caller is allowed to clobber as well
without saving it in the prologue).

Fixes bytecodealliance#2254.
cfallin added a commit that referenced this issue Nov 30, 2020
This PR updates the AArch64 ABI implementation so that it (i) properly
respects that v8-v15 inclusive have callee-save lower halves, and
caller-save upper halves, by conservatively approximating (to full
registers) in the appropriate directions when generating prologue
caller-saves and when informing the regalloc of clobbered regs across
callsites.

In order to prevent saving all of these vector registers in the prologue
of every non-leaf function due to the above approximation, this also
makes use of a new regalloc.rs feature to exclude call instructions'
writes from the clobber set returned by register allocation. This is
safe whenever the caller and callee have the same ABI (because anything
the callee could clobber, the caller is allowed to clobber as well
without saving it in the prologue).

Fixes #2254.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant