-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AArch64: Add test cases for callee-saved SIMD & FP registers #2228
AArch64: Add test cases for callee-saved SIMD & FP registers #2228
Conversation
The default call call conv for aarch64 is systemv, not aapcs. There are some differences between the two in the float handling on at least arm32. For example https://github.com/rust-lang/rust/blob/85fbf49ce0e2274d0acf798f6e703747674feec3/compiler/rustc_target/src/abi/call/arm.rs#L91. I don't know if this is also the case on aarch64 and if so if this is one example. |
Subscribe to Label Actioncc @bnjbvr
This issue or pull request has been labeled: "cranelift"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
AArch32 is a completely different case, in which there are many ABI variants indeed. |
Thanks @akirilov-arm -- it seems you're right that we could save half of our stack space and memory traffic for FP/vec clobber-saves. This should be a straightforward change in the ABI code; I want to verify first that this won't break SpiderMonkey (I think not, as every register is caller-save, IIRC) then I'll create a PR. |
@cfallin I might have misunderstood you, but let me rephrase: There are 2 issues - one on the caller and one on the callee side. You are talking about the latter, but I am more concerned about the former. Consider the test case in the PR - the compiler is stashing the vectors (and they are full, 128-bit vectors) before the call to For comparison, here is what GCC and LLVM are doing in an equivalent situation. |
Ah, right, this is a bigger issue with respect to callsites; I had missed that half of it, sorry. So because we don't reason about half-clobbers currently (or overlapping registers), we need to treat all vector registers as clobbers on the caller side. I'll create a patch for this first thing Monday. |
So on further consideration (and after doing the one-line patch suggested and studying its effect), the solution is actually more complex than I had suggested above:
The problem is how these interact: any function call at all now forces a bunch of clobber saves/restores in the caller's prologue and epilogue, because the half-clobbers from the call become full-clobbers (due to the imprecision of not reasoning about half-registers). In essence, we need a way to "pass through" the half-clobbers from callsite to prologue/epilogue, as long as callee and caller have the same ABI; only true clobbers from the rest of the function body should alter the prologue/epilogue. I can think of a few hacks, e.g. add the full-clobbers at callsites (as defs) to avoid regalloc using the registers, but compute our own clobber-set by scanning over every instruction except callsites; but this feels very error-prone and fragile compared to simply doing the right thing and reasoning through overlapping registers. @julian-seward1, thoughts on this re: regalloc and how difficult it would be to support half-defs / half-clobbers? |
d63c5b6
to
8044025
Compare
@cfallin I realized that I should expand my changes and add test cases with
Do you mean that the caller lowering should pass somehow the clobber information to the callee lowering? I thought that lowering passes for different functions were more or less independent, and indeed happened in different worker threads... Unfortunately I can't say anything the rest of your comments because I am missing some information - in particular, your quick fix and its effects (expanding the test cases should help with the latter). |
No, what I mean is more conceptual: if the callee clobbers some set of registers
So this will require a small change to |
Copyright (c) 2020, Arm Limited.
8044025
to
d18de69
Compare
@cfallin This PR is now ready. IMHO we should merge it (provided that there are no suggestions for improvement) because it should make it a little bit easier to experiment with the ABI and register allocator implementations while trying to devise a solution for the wider issue. |
Yes, agreed; I'll do that and then create a separate issue to track the fix (conforming with AAPCS half-caller/half-callee-save convention). |
This PR adds a test case for a function that saves clobbered vector registers on the stack.
Its purpose is to demonstrate that Cranelift deviates from the Procedure Call Standard for the Arm 64-bit Architecture (AAPCS64). The AAPCS64 specifies that a callee is only responsible for saving the bottom 64 bits of a vector register, so it is not possible to preserve a vector value across a function call by keeping it in a callee-saved register (let's ignore the Scalable Vector Extension for the time being, but keep in mind that it makes the rules more complicated).
Due to another deviation from AAPCS64 (the whole vector register is saved instead of only the bottom half), Cranelift effectively implements its own custom calling convention, so there are no correctness issues as long as the generated code calls only functions with the same behaviour.
The wider question is - are we striving for strict AAPCS64 compliance? That's in the non-baldrdash case, of course. I can also open an issue if a longer discussion would be necessary.
cc @cfallin @julian-seward1