Skip to content

Commit

Permalink
Rework/simplify unwind infrastructure and implement Windows unwind.
Browse files Browse the repository at this point in the history
Our previous implementation of unwind infrastructure was somewhat
complex and brittle: it parsed generated instructions in order to
reverse-engineer unwind info from prologues. It also relied on some
fragile linkage to communicate instruction-layout information that VCode
was not designed to provide.

A much simpler, more reliable, and easier-to-reason-about approach is to
embed unwind directives as pseudo-instructions in the prologue as we
generate it. That way, we can say what we mean and just emit it
directly.

The usual reasoning that leads to the reverse-engineering approach is
that metadata is hard to keep in sync across optimization passes; but
here, (i) prologues are generated at the very end of the pipeline, and
(ii) if we ever do a post-prologue-gen optimization, we can treat unwind
directives as black boxes with unknown side-effects, just as we do for
some other pseudo-instructions today.

It turns out that it was easier to just build this for both x64 and
aarch64 (since they share a factored-out ABI implementation), and wire
up the platform-specific unwind-info generation for Windows and SystemV.
Now we have simpler unwind on all platforms and we can delete the old
unwind infra as soon as we remove the old backend.

There were a few consequences to supporting Fastcall unwind in
particular that led to a refactor of the common ABI. Windows only
supports naming clobbered-register save locations within 240 bytes of
the frame-pointer register, whatever one chooses that to be (RSP or
RBP). We had previously saved clobbers below the fixed frame (and below
nominal-SP). The 240-byte range has to include the old RBP too, so we're
forced to place clobbers at the top of the frame, just below saved
RBP/RIP. This is fine; we always keep a frame pointer anyway because we
use it to refer to stack args. It does mean that offsets of fixed-frame
slots (spillslots, stackslots) from RBP are no longer known before we do
regalloc, so if we ever want to index these off of RBP rather than
nominal-SP because we add support for `alloca` (dynamic frame growth),
then we'll need a "nominal-BP" mode that is resolved after regalloc and
clobber-save code is generated. I added a comment to this effect in
`abi_impl.rs`.

The above refactor touched both x64 and aarch64 because of shared code.
This had a further effect in that the old aarch64 prologue generation
subtracted from `sp` once to allocate space, then used stores to `[sp,
offset]` to save clobbers. Unfortunately the offset only has 7-bit
range, so if there are enough clobbered registers (and there can be --
aarch64 has 384 bytes of registers; at least one unit test hits this)
the stores/loads will be out-of-range. I really don't want to synthesize
large-offset sequences here; better to go back to the simpler
pre-index/post-index `stp r1, r2, [sp, #-16]` form that works just like
a "push". It's likely not much worse microarchitecturally (dependence
chain on SP, but oh well) and it actually saves an instruction if
there's no other frame to allocate. As a further advantage, it's much
simpler to understand; simpler is usually better.

This PR adds the new backend on Windows to CI as well.
  • Loading branch information
cfallin committed Mar 12, 2021
1 parent 05688aa commit 2d5db92
Show file tree
Hide file tree
Showing 63 changed files with 904 additions and 1,087 deletions.
12 changes: 12 additions & 0 deletions cranelift/codegen/meta/src/shared/settings.rs
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,18 @@ pub(crate) fn define() -> SettingGroup {
false,
);

settings.add_bool(
"unwind_info",
r#"
Generate unwind info. This increases metadata size and compile time,
but allows for the debugger to trace frames, is needed for GC tracing
that relies on libunwind (such as in Wasmtime), and is
unconditionally needed on certain platforms (such as Windows) that
must always be able to unwind.
"#,
true,
);

// BaldrMonkey requires that not-yet-relocated function addresses be encoded
// as all-ones bitpatterns.
settings.add_bool(
Expand Down
161 changes: 103 additions & 58 deletions cranelift/codegen/src/isa/aarch64/abi.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ use crate::ir::Opcode;
use crate::ir::{ExternalName, LibCall};
use crate::isa;
use crate::isa::aarch64::{inst::EmitState, inst::*};
use crate::isa::unwind::UnwindInst;
use crate::machinst::*;
use crate::settings;
use crate::{CodegenError, CodegenResult};
Expand Down Expand Up @@ -472,7 +473,7 @@ impl ABIMachineSpec for AArch64MachineDeps {
}
}

fn gen_prologue_frame_setup() -> SmallInstVec<Inst> {
fn gen_prologue_frame_setup(flags: &settings::Flags) -> SmallInstVec<Inst> {
let mut insts = SmallVec::new();
// stp fp (x29), lr (x30), [sp, #-16]!
insts.push(Inst::StoreP64 {
Expand All @@ -484,6 +485,15 @@ impl ABIMachineSpec for AArch64MachineDeps {
),
flags: MemFlags::trusted(),
});

if flags.unwind_info() {
insts.push(Inst::Unwind {
inst: UnwindInst::PushFrameRegs {
offset_upward_to_caller_sp: 16, // FP, LR
},
});
}

// mov fp (x29), sp. This uses the ADDI rd, rs, 0 form of `MOV` because
// the usual encoding (`ORR`) does not work with SP.
insts.push(Inst::AluRRImm12 {
Expand All @@ -498,20 +508,14 @@ impl ABIMachineSpec for AArch64MachineDeps {
insts
}

fn gen_epilogue_frame_restore() -> SmallInstVec<Inst> {
fn gen_epilogue_frame_restore(_: &settings::Flags) -> SmallInstVec<Inst> {
let mut insts = SmallVec::new();

// MOV (alias of ORR) interprets x31 as XZR, so use an ADD here.
// MOV to SP is an alias of ADD.
insts.push(Inst::AluRRImm12 {
alu_op: ALUOp::Add64,
rd: writable_stack_reg(),
rn: fp_reg(),
imm12: Imm12 {
bits: 0,
shift12: false,
},
});
// N.B.: sp is already adjusted to the appropriate place by the
// clobber-restore code (which also frees the fixed frame). Hence, there
// is no need for the usual `mov sp, fp` here.

// `ldp fp, lr, [sp], #16`
insts.push(Inst::LoadP64 {
rt: writable_fp_reg(),
rt2: writable_link_reg(),
Expand All @@ -521,7 +525,6 @@ impl ABIMachineSpec for AArch64MachineDeps {
),
flags: MemFlags::trusted(),
});

insts
}

Expand All @@ -535,21 +538,43 @@ impl ABIMachineSpec for AArch64MachineDeps {
// nominal SP offset; abi_impl generic code will do that.
fn gen_clobber_save(
call_conv: isa::CallConv,
_: &settings::Flags,
flags: &settings::Flags,
clobbers: &Set<Writable<RealReg>>,
fixed_frame_storage_size: u32,
_outgoing_args_size: u32,
) -> (u64, SmallVec<[Inst; 16]>) {
let mut insts = SmallVec::new();
let (clobbered_int, clobbered_vec) = get_regs_saved_in_prologue(call_conv, clobbers);

let (int_save_bytes, vec_save_bytes) = saved_reg_stack_size(&clobbered_int, &clobbered_vec);
let total_save_bytes = (vec_save_bytes + int_save_bytes) as i32;
insts.extend(Self::gen_sp_reg_adjust(
-(total_save_bytes + fixed_frame_storage_size as i32),
));
let total_save_bytes = int_save_bytes + vec_save_bytes;
let clobber_size = total_save_bytes as i32;

if flags.unwind_info() {
// The *unwind* frame (but not the actual frame) starts at the
// clobbers, just below the saved FP/LR pair.
insts.push(Inst::Unwind {
inst: UnwindInst::DefineNewFrame {
offset_downward_to_clobbers: clobber_size as u32,
offset_upward_to_caller_sp: 16, // FP, LR
},
});
}

for (i, reg_pair) in clobbered_int.chunks(2).enumerate() {
// We use pre-indexed addressing modes here, rather than the possibly
// more efficient "subtract sp once then used fixed offsets" scheme,
// because (i) we cannot necessarily guarantee that the offset of a
// clobber-save slot will be within a SImm7Scaled (+504-byte) offset
// range of the whole frame including other slots, it is more complex to
// conditionally generate a two-stage SP adjustment (clobbers then fixed
// frame) otherwise, and generally we just want to maintain simplicity
// here for maintainability. Because clobbers are at the top of the
// frame, just below FP, all that is necessary is to use the pre-indexed
// "push" `[sp, #-16]!` addressing mode.
//
// `frame_offset` tracks offset above start-of-clobbers for unwind-info
// purposes.
let mut clobber_offset = clobber_size as u32;
for reg_pair in clobbered_int.chunks(2) {
let (r1, r2) = if reg_pair.len() == 2 {
// .to_reg().to_reg(): Writable<RealReg> --> RealReg --> Reg
(reg_pair[0].to_reg().to_reg(), reg_pair[1].to_reg().to_reg())
Expand All @@ -560,28 +585,56 @@ impl ABIMachineSpec for AArch64MachineDeps {
debug_assert!(r1.get_class() == RegClass::I64);
debug_assert!(r2.get_class() == RegClass::I64);

// stp r1, r2, [sp, #(i * #16)]
// stp r1, r2, [sp, #-16]!
insts.push(Inst::StoreP64 {
rt: r1,
rt2: r2,
mem: PairAMode::SignedOffset(
stack_reg(),
SImm7Scaled::maybe_from_i64((i * 16) as i64, types::I64).unwrap(),
mem: PairAMode::PreIndexed(
writable_stack_reg(),
SImm7Scaled::maybe_from_i64(-16, types::I64).unwrap(),
),
flags: MemFlags::trusted(),
});
if flags.unwind_info() {
clobber_offset -= 8;
if r2 != zero_reg() {
insts.push(Inst::Unwind {
inst: UnwindInst::SaveReg {
clobber_offset,
reg: r2.to_real_reg(),
},
});
}
clobber_offset -= 8;
insts.push(Inst::Unwind {
inst: UnwindInst::SaveReg {
clobber_offset,
reg: r1.to_real_reg(),
},
});
}
}

let vec_offset = int_save_bytes;
for (i, reg) in clobbered_vec.iter().enumerate() {
for reg in clobbered_vec.iter() {
insts.push(Inst::FpuStore128 {
rd: reg.to_reg().to_reg(),
mem: AMode::Unscaled(
stack_reg(),
SImm9::maybe_from_i64((vec_offset + (i * 16)) as i64).unwrap(),
),
mem: AMode::PreIndexed(writable_stack_reg(), SImm9::maybe_from_i64(-16).unwrap()),
flags: MemFlags::trusted(),
});
if flags.unwind_info() {
clobber_offset -= 16;
insts.push(Inst::Unwind {
inst: UnwindInst::SaveReg {
clobber_offset,
reg: reg.to_reg(),
},
});
}
}

// Allocate the fixed frame below the clobbers if necessary.
if fixed_frame_storage_size > 0 {
insts.extend(Self::gen_sp_reg_adjust(-(fixed_frame_storage_size as i32)));
}

(total_save_bytes as u64, insts)
Expand All @@ -591,14 +644,25 @@ impl ABIMachineSpec for AArch64MachineDeps {
call_conv: isa::CallConv,
flags: &settings::Flags,
clobbers: &Set<Writable<RealReg>>,
_fixed_frame_storage_size: u32,
_outgoing_args_size: u32,
fixed_frame_storage_size: u32,
) -> SmallVec<[Inst; 16]> {
let mut insts = SmallVec::new();
let (clobbered_int, clobbered_vec) = get_regs_saved_in_prologue(call_conv, clobbers);

let (int_save_bytes, vec_save_bytes) = saved_reg_stack_size(&clobbered_int, &clobbered_vec);
for (i, reg_pair) in clobbered_int.chunks(2).enumerate() {
// Free the fixed frame if necessary.
if fixed_frame_storage_size > 0 {
insts.extend(Self::gen_sp_reg_adjust(fixed_frame_storage_size as i32));
}

for reg in clobbered_vec.iter().rev() {
insts.push(Inst::FpuLoad128 {
rd: Writable::from_reg(reg.to_reg().to_reg()),
mem: AMode::PostIndexed(writable_stack_reg(), SImm9::maybe_from_i64(16).unwrap()),
flags: MemFlags::trusted(),
});
}

for reg_pair in clobbered_int.chunks(2).rev() {
let (r1, r2) = if reg_pair.len() == 2 {
(
reg_pair[0].map(|r| r.to_reg()),
Expand All @@ -611,37 +675,18 @@ impl ABIMachineSpec for AArch64MachineDeps {
debug_assert!(r1.to_reg().get_class() == RegClass::I64);
debug_assert!(r2.to_reg().get_class() == RegClass::I64);

// ldp r1, r2, [sp, #(i * 16)]
// ldp r1, r2, [sp], #16
insts.push(Inst::LoadP64 {
rt: r1,
rt2: r2,
mem: PairAMode::SignedOffset(
stack_reg(),
SImm7Scaled::maybe_from_i64((i * 16) as i64, types::I64).unwrap(),
),
flags: MemFlags::trusted(),
});
}

for (i, reg) in clobbered_vec.iter().enumerate() {
insts.push(Inst::FpuLoad128 {
rd: Writable::from_reg(reg.to_reg().to_reg()),
mem: AMode::Unscaled(
stack_reg(),
SImm9::maybe_from_i64(((i * 16) + int_save_bytes) as i64).unwrap(),
mem: PairAMode::PostIndexed(
writable_stack_reg(),
SImm7Scaled::maybe_from_i64(16, I64).unwrap(),
),
flags: MemFlags::trusted(),
});
}

// For non-baldrdash calling conventions, the frame pointer
// will be moved into the stack pointer in the epilogue, so we
// can skip restoring the stack pointer value with this `add`.
if call_conv.extends_baldrdash() {
let total_save_bytes = (int_save_bytes + vec_save_bytes) as i32;
insts.extend(Self::gen_sp_reg_adjust(total_save_bytes));
}

// If this is Baldrdash-2020, restore the callee (i.e., our) TLS
// register. We may have allocated it for something else and clobbered
// it, but the ABI expects us to leave the TLS register unchanged.
Expand Down
5 changes: 4 additions & 1 deletion cranelift/codegen/src/isa/aarch64/inst/emit.rs
Original file line number Diff line number Diff line change
Expand Up @@ -542,7 +542,6 @@ impl MachInstEmitInfo for EmitInfo {
impl MachInstEmit for Inst {
type State = EmitState;
type Info = EmitInfo;
type UnwindInfo = super::unwind::AArch64UnwindInfo;

fn emit(&self, sink: &mut MachBuffer<Inst>, emit_info: &Self::Info, state: &mut EmitState) {
// N.B.: we *must* not exceed the "worst-case size" used to compute
Expand Down Expand Up @@ -2379,6 +2378,10 @@ impl MachInstEmit for Inst {
&Inst::ValueLabelMarker { .. } => {
// Nothing; this is only used to compute debug info.
}

&Inst::Unwind { ref inst } => {
sink.add_unwind(inst.clone());
}
}

let end_off = sink.cur_offset();
Expand Down
12 changes: 12 additions & 0 deletions cranelift/codegen/src/isa/aarch64/inst/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ use crate::ir::types::{
B1, B128, B16, B32, B64, B8, F32, F64, FFLAGS, I128, I16, I32, I64, I8, I8X16, IFLAGS, R32, R64,
};
use crate::ir::{ExternalName, MemFlags, Opcode, SourceLoc, TrapCode, Type, ValueLabel};
use crate::isa::unwind::UnwindInst;
use crate::isa::CallConv;
use crate::machinst::*;
use crate::{settings, CodegenError, CodegenResult};
Expand Down Expand Up @@ -1216,6 +1217,11 @@ pub enum Inst {
reg: Reg,
label: ValueLabel,
},

/// An unwind pseudo-instruction.
Unwind {
inst: UnwindInst,
},
}

fn count_zero_half_words(mut value: u64, num_half_words: u8) -> usize {
Expand Down Expand Up @@ -2026,6 +2032,7 @@ fn aarch64_get_regs(inst: &Inst, collector: &mut RegUsageCollector) {
&Inst::ValueLabelMarker { reg, .. } => {
collector.add_use(reg);
}
&Inst::Unwind { .. } => {}
&Inst::EmitIsland { .. } => {}
}
}
Expand Down Expand Up @@ -2779,6 +2786,7 @@ fn aarch64_map_regs<RUM: RegUsageMapper>(inst: &mut Inst, mapper: &RUM) {
&mut Inst::ValueLabelMarker { ref mut reg, .. } => {
map_use(mapper, reg);
}
&mut Inst::Unwind { .. } => {}
}
}

Expand Down Expand Up @@ -4097,6 +4105,10 @@ impl Inst {
&Inst::ValueLabelMarker { label, reg } => {
format!("value_label {:?}, {}", label, reg.show_rru(mb_rru))
}

&Inst::Unwind { ref inst } => {
format!("unwind {:?}", inst)
}
}
}
}
Expand Down
Loading

0 comments on commit 2d5db92

Please sign in to comment.