Skip to content

Commit

Permalink
Implement Wasm Atomics for Cranelift/newBE/aarch64.
Browse files Browse the repository at this point in the history
The implementation is pretty straightforward.  Wasm atomic instructions fall
into 5 groups

* atomic read-modify-write
* atomic compare-and-swap
* atomic loads
* atomic stores
* fences

and the implementation mirrors that structure, at both the CLIF and AArch64
levels.

At the CLIF level, there are five new instructions, one for each group.  Some
comments about these:

* for those that take addresses (all except fences), the address is contained
  entirely in a single `Value`; there is no offset field as there is with
  normal loads and stores.  Wasm atomics require alignment checks, and
  removing the offset makes implementation of those checks a bit simpler.

* atomic loads and stores get their own instructions, rather than reusing the
  existing load and store instructions, for two reasons:

  - per above comment, makes alignment checking simpler

  - reuse of existing loads and stores would require extension of `MemFlags`
    to indicate atomicity, which sounds semantically unclean.  For example,
    then *any* instruction carrying `MemFlags` could be marked as atomic, even
    in cases where it is meaningless or ambiguous.

* I tried to specify, in comments, the behaviour of these instructions as
  tightly as I could.  Unfortunately there is no way (per my limited CLIF
  knowledge) to enforce the constraint that they may only be used on I8, I16,
  I32 and I64 types, and in particular not on floating point or vector types.

The translation from Wasm to CLIF, in `code_translator.rs` is unremarkable.

At the AArch64 level, there are also five new instructions, one for each
group.  All of them except `::Fence` contain multiple real machine
instructions.  Atomic r-m-w and atomic c-a-s are emitted as the usual
load-linked store-conditional loops, guarded at both ends by memory fences.
Atomic loads and stores are emitted as a load preceded by a fence, and a store
followed by a fence, respectively.  The amount of fencing may be overkill, but
it reflects exactly what the SM Wasm baseline compiler for AArch64 does.

One reason to implement r-m-w and c-a-s as a single insn which is expanded
only at emission time is that we must be very careful what instructions we
allow in between the load-linked and store-conditional.  In particular, we
cannot allow *any* extra memory transactions in there, since -- particularly
on low-end hardware -- that might cause the transaction to fail, hence
deadlocking the generated code.  That implies that we can't present the LL/SC
loop to the register allocator as its constituent instructions, since it might
insert spills anywhere.  Hence we must present it as a single indivisible
unit, as we do here.  It also has the benefit of reducing the total amount of
work the RA has to do.

The only other notable feature of the r-m-w and c-a-s translations into
AArch64 code, is that they both need a scratch register internally.  Rather
than faking one up by claiming, in `get_regs` that it modifies an extra
scratch register, and having to have a dummy initialisation of it, these new
instructions (`::LLSC` and `::CAS`) simply use fixed registers in the range
x24-x28.  We rely on the RA's ability to coalesce V<-->R copies to make the
cost of the resulting extra copies zero or almost zero.  x24-x28 are chosen so
as to be call-clobbered, hence their use is less likely to interfere with long
live ranges that span calls.

One subtlety regarding the use of completely fixed input and output registers
is that we must be careful how the surrounding copy from/to of the arg/result
registers is done.  In particular, it is not safe to simply emit copies in
some arbitrary order if one of the arg registers is a real reg.  For that
reason, the arguments are first moved into virtual regs if they are not
already there, using a new method `<LowerCtx for Lower>::ensure_in_vreg`.
Again, we rely on coalescing to turn them into no-ops in the common case.

There is also a ridealong fix for the AArch64 lowering case for
`Opcode::Trapif | Opcode::Trapff`, which removes a bug in which two trap insns
in a row were generated.

In the patch as submitted there are 6 "FIXME JRS" comments, which mark things
which I believe to be correct, but for which I would appreciate a second
opinion.  Unless otherwise directed, I will remove them for the final commit
but leave the associated code/comments unchanged.
  • Loading branch information
julian-seward1 committed Jul 31, 2020
1 parent 8fd9209 commit 1b7c1b2
Show file tree
Hide file tree
Showing 23 changed files with 1,630 additions and 80 deletions.
32 changes: 31 additions & 1 deletion cranelift/codegen/meta/src/shared/formats.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,10 @@ use crate::shared::{entities::EntityRefs, immediates::Immediates};
use std::rc::Rc;

pub(crate) struct Formats {
pub(crate) atomic_cas: Rc<InstructionFormat>,
pub(crate) atomic_rmw: Rc<InstructionFormat>,
pub(crate) binary: Rc<InstructionFormat>,
pub(crate) binary_imm8: Rc<InstructionFormat>,
pub(crate) binary_imm64: Rc<InstructionFormat>,
pub(crate) branch: Rc<InstructionFormat>,
pub(crate) branch_float: Rc<InstructionFormat>,
Expand All @@ -17,7 +20,6 @@ pub(crate) struct Formats {
pub(crate) cond_trap: Rc<InstructionFormat>,
pub(crate) copy_special: Rc<InstructionFormat>,
pub(crate) copy_to_ssa: Rc<InstructionFormat>,
pub(crate) binary_imm8: Rc<InstructionFormat>,
pub(crate) float_compare: Rc<InstructionFormat>,
pub(crate) float_cond: Rc<InstructionFormat>,
pub(crate) float_cond_trap: Rc<InstructionFormat>,
Expand All @@ -32,6 +34,7 @@ pub(crate) struct Formats {
pub(crate) jump: Rc<InstructionFormat>,
pub(crate) load: Rc<InstructionFormat>,
pub(crate) load_complex: Rc<InstructionFormat>,
pub(crate) load_no_offset: Rc<InstructionFormat>,
pub(crate) multiary: Rc<InstructionFormat>,
pub(crate) nullary: Rc<InstructionFormat>,
pub(crate) reg_fill: Rc<InstructionFormat>,
Expand All @@ -42,6 +45,7 @@ pub(crate) struct Formats {
pub(crate) stack_store: Rc<InstructionFormat>,
pub(crate) store: Rc<InstructionFormat>,
pub(crate) store_complex: Rc<InstructionFormat>,
pub(crate) store_no_offset: Rc<InstructionFormat>,
pub(crate) table_addr: Rc<InstructionFormat>,
pub(crate) ternary: Rc<InstructionFormat>,
pub(crate) ternary_imm8: Rc<InstructionFormat>,
Expand Down Expand Up @@ -202,6 +206,21 @@ impl Formats {

func_addr: Builder::new("FuncAddr").imm(&entities.func_ref).build(),

atomic_rmw: Builder::new("AtomicRmw")
.imm(&imm.memflags)
.imm(&imm.atomic_rmw_op)
.value()
.value()
.build(),

atomic_cas: Builder::new("AtomicCas")
.imm(&imm.memflags)
.value()
.value()
.value()
.typevar_operand(2)
.build(),

load: Builder::new("Load")
.imm(&imm.memflags)
.value()
Expand All @@ -214,6 +233,11 @@ impl Formats {
.imm(&imm.offset32)
.build(),

load_no_offset: Builder::new("LoadNoOffset")
.imm(&imm.memflags)
.value()
.build(),

store: Builder::new("Store")
.imm(&imm.memflags)
.value()
Expand All @@ -228,6 +252,12 @@ impl Formats {
.imm(&imm.offset32)
.build(),

store_no_offset: Builder::new("StoreNoOffset")
.imm(&imm.memflags)
.value()
.value()
.build(),

stack_load: Builder::new("StackLoad")
.imm(&entities.stack_slot)
.imm(&imm.offset32)
Expand Down
14 changes: 14 additions & 0 deletions cranelift/codegen/meta/src/shared/immediates.rs
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,9 @@ pub(crate) struct Immediates {
///
/// The Rust enum type also has a `User(u16)` variant for user-provided trap codes.
pub trapcode: OperandKind,

/// A code indicating the arithmetic operation to perform in an atomic_rmw memory access.
pub atomic_rmw_op: OperandKind,
}

fn new_imm(format_field_name: &'static str, rust_type: &'static str) -> OperandKind {
Expand Down Expand Up @@ -156,6 +159,17 @@ impl Immediates {
trapcode_values.insert("int_divz", "IntegerDivisionByZero");
new_enum("code", "ir::TrapCode", trapcode_values).with_doc("A trap reason code.")
},
atomic_rmw_op: {
let mut atomic_rmw_op_values = HashMap::new();
atomic_rmw_op_values.insert("add", "Add");
atomic_rmw_op_values.insert("sub", "Sub");
atomic_rmw_op_values.insert("and", "And");
atomic_rmw_op_values.insert("or", "Or");
atomic_rmw_op_values.insert("xor", "Xor");
atomic_rmw_op_values.insert("xchg", "Xchg");
new_enum("op", "ir::AtomicRmwOp", atomic_rmw_op_values)
.with_doc("Atomic Read-Modify-Write Ops")
},
}
}
}
104 changes: 104 additions & 0 deletions cranelift/codegen/meta/src/shared/instructions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4305,5 +4305,109 @@ pub(crate) fn define(
.is_ghost(true),
);

// Instructions relating to atomic memory accesses and fences
let AtomicMem = &TypeVar::new(
"AtomicMem",
"Any type that can be stored in memory, which can be used in an atomic operation",
TypeSetBuilder::new().ints(8..64).build(),
);
let x = &Operand::new("x", AtomicMem).with_doc("Value to be atomically stored");
let a = &Operand::new("a", AtomicMem).with_doc("Value atomically loaded");
let e = &Operand::new("e", AtomicMem).with_doc("Expected value in CAS");
let p = &Operand::new("p", iAddr);
let MemFlags = &Operand::new("MemFlags", &imm.memflags);
let AtomicRmwOp = &Operand::new("AtomicRmwOp", &imm.atomic_rmw_op);

ig.push(
Inst::new(
"atomic_rmw",
r#"
Atomically read-modify-write memory at `p`, with second operand `x`. The old value is
returned. `p` has the type of the target word size, and `x` may be an integer type of
8, 16, 32 or 64 bits, even on a 32-bit target. The type of the returned value is the
same as the type of `x`. This operation is sequentially consistent and creates
happens-before edges that order normal (non-atomic) loads and stores.
"#,
&formats.atomic_rmw,
)
.operands_in(vec![MemFlags, AtomicRmwOp, p, x])
.operands_out(vec![a])
.can_load(true)
.can_store(true)
.other_side_effects(true),
);

ig.push(
Inst::new(
"atomic_cas",
r#"
Perform an atomic compare-and-swap operation on memory at `p`, with expected value `e`,
storing `x` if the value at `p` equals `e`. The old value at `p` is returned,
regardless of whether the operation succeeds or fails. `p` has the type of the target
word size, and `x` and `e` must have the same type and the same size, which may be an
integer type of 8, 16, 32 or 64 bits, even on a 32-bit target. The type of the returned
value is the same as the type of `x` and `e`. This operation is sequentially
consistent and creates happens-before edges that order normal (non-atomic) loads and
stores.
"#,
&formats.atomic_cas,
)
.operands_in(vec![MemFlags, p, e, x])
.operands_out(vec![a])
.can_load(true)
.can_store(true)
.other_side_effects(true),
);

ig.push(
Inst::new(
"atomic_load",
r#"
Atomically load from memory at `p`.
This is a polymorphic instruction that can load any value type which has a memory
representation. It should only be used for integer types with 8, 16, 32 or 64 bits.
This operation is sequentially consistent and creates happens-before edges that order
normal (non-atomic) loads and stores.
"#,
&formats.load_no_offset,
)
.operands_in(vec![MemFlags, p])
.operands_out(vec![a])
.can_load(true)
.other_side_effects(true),
);

ig.push(
Inst::new(
"atomic_store",
r#"
Atomically store `x` to memory at `p`.
This is a polymorphic instruction that can store any value type with a memory
representation. It should only be used for integer types with 8, 16, 32 or 64 bits.
This operation is sequentially consistent and creates happens-before edges that order
normal (non-atomic) loads and stores.
"#,
&formats.store_no_offset,
)
.operands_in(vec![MemFlags, x, p])
.can_store(true)
.other_side_effects(true),
);

ig.push(
Inst::new(
"fence",
r#"
A memory fence. This must provide ordering to ensure that, at a minimum, neither loads
nor stores of any kind may move forwards or backwards across the fence. This operation
is sequentially consistent.
"#,
&formats.nullary,
)
.other_side_effects(true),
);

ig.build()
}
52 changes: 52 additions & 0 deletions cranelift/codegen/src/ir/atomic_rmw_op.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
/// Describes the arithmetic operation in an atomic memory read-modify-write operation.
use core::fmt::{self, Display, Formatter};
use core::str::FromStr;
#[cfg(feature = "enable-serde")]
use serde::{Deserialize, Serialize};

#[derive(Clone, Copy, PartialEq, Eq, Debug, Hash)]
#[cfg_attr(feature = "enable-serde", derive(Serialize, Deserialize))]
/// Describes the arithmetic operation in an atomic memory read-modify-write operation.
pub enum AtomicRmwOp {
/// Add
Add,
/// Sub
Sub,
/// And
And,
/// Or
Or,
/// Xor
Xor,
/// Exchange
Xchg,
}

impl Display for AtomicRmwOp {
fn fmt(&self, f: &mut Formatter) -> fmt::Result {
let s = match self {
AtomicRmwOp::Add => "add",
AtomicRmwOp::Sub => "sub",
AtomicRmwOp::And => "and",
AtomicRmwOp::Or => "or",
AtomicRmwOp::Xor => "xor",
AtomicRmwOp::Xchg => "xchg",
};
f.write_str(s)
}
}

impl FromStr for AtomicRmwOp {
type Err = ();
fn from_str(s: &str) -> Result<Self, Self::Err> {
match s {
"add" => Ok(AtomicRmwOp::Add),
"sub" => Ok(AtomicRmwOp::Sub),
"and" => Ok(AtomicRmwOp::And),
"or" => Ok(AtomicRmwOp::Or),
"xor" => Ok(AtomicRmwOp::Xor),
"xchg" => Ok(AtomicRmwOp::Xchg),
_ => Err(()),
}
}
}
2 changes: 2 additions & 0 deletions cranelift/codegen/src/ir/mod.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
//! Representation of Cranelift IR functions.

mod atomic_rmw_op;
mod builder;
pub mod constant;
pub mod dfg;
Expand All @@ -26,6 +27,7 @@ mod valueloc;
#[cfg(feature = "enable-serde")]
use serde::{Deserialize, Serialize};

pub use crate::ir::atomic_rmw_op::AtomicRmwOp;
pub use crate::ir::builder::{
InsertBuilder, InstBuilder, InstBuilderBase, InstInserterBase, ReplaceBuilder,
};
Expand Down
8 changes: 7 additions & 1 deletion cranelift/codegen/src/ir/trapcode.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ pub enum TrapCode {
/// offset-guard pages.
HeapOutOfBounds,

/// A wasm atomic operation was presented with a not-naturally-aligned linear-memory address.
HeapMisaligned,

/// A `table_addr` instruction detected an out-of-bounds error.
TableOutOfBounds,

Expand Down Expand Up @@ -59,6 +62,7 @@ impl Display for TrapCode {
let identifier = match *self {
StackOverflow => "stk_ovf",
HeapOutOfBounds => "heap_oob",
HeapMisaligned => "heap_misaligned",
TableOutOfBounds => "table_oob",
IndirectCallToNull => "icall_null",
BadSignature => "bad_sig",
Expand All @@ -81,6 +85,7 @@ impl FromStr for TrapCode {
match s {
"stk_ovf" => Ok(StackOverflow),
"heap_oob" => Ok(HeapOutOfBounds),
"heap_misaligned" => Ok(HeapMisaligned),
"table_oob" => Ok(TableOutOfBounds),
"icall_null" => Ok(IndirectCallToNull),
"bad_sig" => Ok(BadSignature),
Expand All @@ -101,9 +106,10 @@ mod tests {
use alloc::string::ToString;

// Everything but user-defined codes.
const CODES: [TrapCode; 10] = [
const CODES: [TrapCode; 11] = [
TrapCode::StackOverflow,
TrapCode::HeapOutOfBounds,
TrapCode::HeapMisaligned,
TrapCode::TableOutOfBounds,
TrapCode::IndirectCallToNull,
TrapCode::BadSignature,
Expand Down
32 changes: 31 additions & 1 deletion cranelift/codegen/src/isa/aarch64/inst/args.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
#![allow(dead_code)]

use crate::ir::types::{F32X2, F32X4, F64X2, I16X4, I16X8, I32X2, I32X4, I64X2, I8X16, I8X8};
use crate::ir::Type;
use crate::ir::{AtomicRmwOp, Type};
use crate::isa::aarch64::inst::*;
use crate::isa::aarch64::lower::ty_bits;
use crate::machinst::MachLabel;
Expand All @@ -14,6 +14,9 @@ use regalloc::{RealRegUniverse, Reg, Writable};
use core::convert::Into;
use std::string::String;

//=============================================================================
// Instruction sub-components: shift and extend descriptors

/// A shift operator for a register or immediate.
#[derive(Clone, Copy, Debug)]
#[repr(u8)]
Expand Down Expand Up @@ -645,3 +648,30 @@ impl VectorSize {
}
}
}

//=============================================================================
// Instruction sub-components: atomic memory update operations

#[derive(Clone, Copy, Debug, PartialEq, Eq)]
#[repr(u8)]
pub enum AtomicRMWOp {
Add,
Sub,
And,
Or,
Xor,
Xchg,
}

impl AtomicRMWOp {
pub fn from(ir_op: AtomicRmwOp) -> Self {
match ir_op {
AtomicRmwOp::Add => AtomicRMWOp::Add,
AtomicRmwOp::Sub => AtomicRMWOp::Sub,
AtomicRmwOp::And => AtomicRMWOp::And,
AtomicRmwOp::Or => AtomicRMWOp::Or,
AtomicRmwOp::Xor => AtomicRMWOp::Xor,
AtomicRmwOp::Xchg => AtomicRMWOp::Xchg,
}
}
}
Loading

0 comments on commit 1b7c1b2

Please sign in to comment.