Skip to content

Commit

Permalink
Implement lazy funcref table and anyfunc initialization.
Browse files Browse the repository at this point in the history
During instance initialization, we build two sorts of arrays eagerly:

- We create an "anyfunc" (a `VMCallerCheckedAnyfunc`) for every function
  in an instance.

- We initialize every element of a funcref table with an initializer to
  a pointer to one of these anyfuncs.

Most instances will not touch (via call_indirect or table.get) all
funcref table elements. And most anyfuncs will never be referenced,
because most functions are never placed in tables or used with
`ref.func`. Thus, both of these initialization tasks are quite wasteful.
Profiling shows that a significant fraction of the remaining
instance-initialization time after our other recent optimizations is
going into these two tasks.

This PR implements two basic ideas:

- The anyfunc array can be lazily initialized as long as we retain the
  information needed to do so. A zero in the func-ptr part of the tuple
  means "uninitalized"; a null-check and slowpath does the
  initialization whenever we take a pointer to an anyfunc.

- A funcref table can be lazily initialized as long as we retain a link
  to its corresponding instance and function index for each element. A
  zero in a table element means "uninitialized", and a slowpath does the
  initialization.

The use of all-zeroes to mean "uninitialized" means that we can use fast
memory clearing techniques, like madvise(DONTNEED) on Linux or just
freshly-mmap'd anonymous memory, to get to the initial state without
a lot of memory writes.

Funcref tables are a little tricky because funcrefs can be null. We need
to distinguish "element was initially non-null, but user stored explicit
null later" from "element never touched" (ie the lazy init should not
blow away an explicitly stored null). We solve this by stealing the LSB
from every funcref (anyfunc pointer): when the LSB is set, the funcref
is initialized and we don't hit the lazy-init slowpath. We insert the
bit on storing to the table and mask it off after loading.

Performance effect on instantiation in the on-demand allocator (pooling
allocator effect should be similar as the table-init path is the same):

```
sequential/default/spidermonkey.wasm
                        time:   [71.886 us 72.012 us 72.133 us]

sequential/default/spidermonkey.wasm
                        time:   [22.243 us 22.256 us 22.270 us]
                        change: [-69.117% -69.060% -69.000%] (p = 0.00 < 0.05)
                        Performance has improved.
```

So, 72µs to 22µs, or a 69% reduction.
  • Loading branch information
cfallin committed Feb 3, 2022
1 parent 8ed79c8 commit 1a0ae24
Show file tree
Hide file tree
Showing 27 changed files with 723 additions and 206 deletions.
2 changes: 1 addition & 1 deletion cranelift/wasm/src/code_translator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -612,7 +612,7 @@ pub fn translate_operator<FE: FuncEnvironment + ?Sized>(
bitcast_arguments(args, &types, builder);

let call = environ.translate_call_indirect(
builder.cursor(),
builder,
TableIndex::from_u32(*table_index),
table,
TypeIndex::from_u32(*index),
Expand Down
20 changes: 10 additions & 10 deletions cranelift/wasm/src/environ/dummy.rs
Original file line number Diff line number Diff line change
Expand Up @@ -404,7 +404,7 @@ impl<'dummy_environment> FuncEnvironment for DummyFuncEnvironment<'dummy_environ

fn translate_call_indirect(
&mut self,
mut pos: FuncCursor,
builder: &mut FunctionBuilder,
_table_index: TableIndex,
_table: ir::Table,
_sig_index: TypeIndex,
Expand All @@ -413,7 +413,7 @@ impl<'dummy_environment> FuncEnvironment for DummyFuncEnvironment<'dummy_environ
call_args: &[ir::Value],
) -> WasmResult<ir::Inst> {
// Pass the current function's vmctx parameter on to the callee.
let vmctx = pos
let vmctx = builder
.func
.special_param(ir::ArgumentPurpose::VMContext)
.expect("Missing vmctx parameter");
Expand All @@ -423,22 +423,22 @@ impl<'dummy_environment> FuncEnvironment for DummyFuncEnvironment<'dummy_environ
// TODO: Generate bounds checking code.
let ptr = self.pointer_type();
let callee_offset = if ptr == I32 {
pos.ins().imul_imm(callee, 4)
builder.ins().imul_imm(callee, 4)
} else {
let ext = pos.ins().uextend(I64, callee);
pos.ins().imul_imm(ext, 4)
let ext = builder.ins().uextend(I64, callee);
builder.ins().imul_imm(ext, 4)
};
let mflags = ir::MemFlags::trusted();
let func_ptr = pos.ins().load(ptr, mflags, callee_offset, 0);
let func_ptr = builder.ins().load(ptr, mflags, callee_offset, 0);

// Build a value list for the indirect call instruction containing the callee, call_args,
// and the vmctx parameter.
let mut args = ir::ValueList::default();
args.push(func_ptr, &mut pos.func.dfg.value_lists);
args.extend(call_args.iter().cloned(), &mut pos.func.dfg.value_lists);
args.push(vmctx, &mut pos.func.dfg.value_lists);
args.push(func_ptr, &mut builder.func.dfg.value_lists);
args.extend(call_args.iter().cloned(), &mut builder.func.dfg.value_lists);
args.push(vmctx, &mut builder.func.dfg.value_lists);

Ok(pos
Ok(builder
.ins()
.CallIndirect(ir::Opcode::CallIndirect, INVALID, sig_ref, args)
.0)
Expand Down
2 changes: 1 addition & 1 deletion cranelift/wasm/src/environ/spec.rs
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,7 @@ pub trait FuncEnvironment: TargetEnvironment {
#[cfg_attr(feature = "cargo-clippy", allow(clippy::too_many_arguments))]
fn translate_call_indirect(
&mut self,
pos: FuncCursor,
builder: &mut FunctionBuilder,
table_index: TableIndex,
table: ir::Table,
sig_index: TypeIndex,
Expand Down
126 changes: 93 additions & 33 deletions crates/cranelift/src/func_environ.rs
Original file line number Diff line number Diff line change
Expand Up @@ -750,6 +750,56 @@ impl<'module_environment> FuncEnvironment<'module_environment> {
pos.ins().uextend(I64, val)
}
}

fn get_or_init_funcref_table_elem(
&mut self,
builder: &mut FunctionBuilder,
table_index: TableIndex,
table: ir::Table,
index: ir::Value,
) -> ir::Value {
let pointer_type = self.pointer_type();

// To support lazy initialization of table
// contents, we check for a null entry here, and
// if null, we take a slow-path that invokes a
// libcall.
let table_entry_addr = builder.ins().table_addr(pointer_type, table, index, 0);
let value = builder
.ins()
.load(pointer_type, ir::MemFlags::trusted(), table_entry_addr, 0);
// Mask off the "initialized bit". See REF_INIT_BIT in
// crates/runtime/src/table.rs for more details.
let value_masked = builder.ins().band_imm(value, !1);

let null_block = builder.create_block();
let continuation_block = builder.create_block();
let result_param = builder.append_block_param(continuation_block, pointer_type);
builder.set_cold_block(null_block);

builder.ins().brz(value, null_block, &[]);
builder.ins().jump(continuation_block, &[value_masked]);
builder.seal_block(null_block);

builder.switch_to_block(null_block);
let table_index = builder.ins().iconst(I32, table_index.index() as i64);
let builtin_idx = BuiltinFunctionIndex::table_get_lazy_init_funcref();
let builtin_sig = self
.builtin_function_signatures
.table_get_lazy_init_funcref(builder.func);
let (vmctx, builtin_addr) =
self.translate_load_builtin_function_address(&mut builder.cursor(), builtin_idx);
let call_inst =
builder
.ins()
.call_indirect(builtin_sig, builtin_addr, &[vmctx, table_index, index]);
let returned_entry = builder.func.dfg.inst_results(call_inst)[0];
builder.ins().jump(continuation_block, &[returned_entry]);
builder.seal_block(continuation_block);

builder.switch_to_block(continuation_block);
result_param
}
}

impl<'module_environment> TargetEnvironment for FuncEnvironment<'module_environment> {
Expand Down Expand Up @@ -886,13 +936,7 @@ impl<'module_environment> cranelift_wasm::FuncEnvironment for FuncEnvironment<'m
match plan.table.wasm_ty {
WasmType::FuncRef => match plan.style {
TableStyle::CallerChecksSignature => {
let table_entry_addr = builder.ins().table_addr(pointer_type, table, index, 0);
Ok(builder.ins().load(
pointer_type,
ir::MemFlags::trusted(),
table_entry_addr,
0,
))
Ok(self.get_or_init_funcref_table_elem(builder, table_index, table, index))
}
},
WasmType::ExternRef => {
Expand Down Expand Up @@ -1033,9 +1077,16 @@ impl<'module_environment> cranelift_wasm::FuncEnvironment for FuncEnvironment<'m
WasmType::FuncRef => match plan.style {
TableStyle::CallerChecksSignature => {
let table_entry_addr = builder.ins().table_addr(pointer_type, table, index, 0);
builder
.ins()
.store(ir::MemFlags::trusted(), value, table_entry_addr, 0);
// Set the "initialized bit". See doc-comment on
// `REF_INIT_BIT` in crates/runtime/src/table.rs
// for details.
let value_with_init_bit = builder.ins().bor_imm(value, 1);
builder.ins().store(
ir::MemFlags::trusted(),
value_with_init_bit,
table_entry_addr,
0,
);
Ok(())
}
},
Expand Down Expand Up @@ -1253,10 +1304,16 @@ impl<'module_environment> cranelift_wasm::FuncEnvironment for FuncEnvironment<'m
mut pos: cranelift_codegen::cursor::FuncCursor<'_>,
func_index: FuncIndex,
) -> WasmResult<ir::Value> {
let vmctx = self.vmctx(&mut pos.func);
let vmctx = pos.ins().global_value(self.pointer_type(), vmctx);
let offset = self.offsets.vmctx_anyfunc(func_index);
Ok(pos.ins().iadd_imm(vmctx, i64::from(offset)))
let func_index = pos.ins().iconst(I32, func_index.as_u32() as i64);
let builtin_index = BuiltinFunctionIndex::ref_func();
let builtin_sig = self.builtin_function_signatures.ref_func(&mut pos.func);
let (vmctx, builtin_addr) =
self.translate_load_builtin_function_address(&mut pos, builtin_index);

let call_inst = pos
.ins()
.call_indirect(builtin_sig, builtin_addr, &[vmctx, func_index]);
Ok(pos.func.dfg.first_result(call_inst))
}

fn translate_custom_global_get(
Expand Down Expand Up @@ -1459,7 +1516,7 @@ impl<'module_environment> cranelift_wasm::FuncEnvironment for FuncEnvironment<'m

fn translate_call_indirect(
&mut self,
mut pos: FuncCursor<'_>,
builder: &mut FunctionBuilder,
table_index: TableIndex,
table: ir::Table,
ty_index: TypeIndex,
Expand All @@ -1469,21 +1526,17 @@ impl<'module_environment> cranelift_wasm::FuncEnvironment for FuncEnvironment<'m
) -> WasmResult<ir::Inst> {
let pointer_type = self.pointer_type();

let table_entry_addr = pos.ins().table_addr(pointer_type, table, callee, 0);

// Dereference the table entry to get the pointer to the
// `VMCallerCheckedAnyfunc`.
let anyfunc_ptr =
pos.ins()
.load(pointer_type, ir::MemFlags::trusted(), table_entry_addr, 0);
// Get the anyfunc pointer (the funcref) from the table.
let anyfunc_ptr = self.get_or_init_funcref_table_elem(builder, table_index, table, callee);

// Check for whether the table element is null, and trap if so.
pos.ins()
builder
.ins()
.trapz(anyfunc_ptr, ir::TrapCode::IndirectCallToNull);

// Dereference anyfunc pointer to get the function address.
let mem_flags = ir::MemFlags::trusted();
let func_addr = pos.ins().load(
let func_addr = builder.ins().load(
pointer_type,
mem_flags,
anyfunc_ptr,
Expand All @@ -1495,36 +1548,41 @@ impl<'module_environment> cranelift_wasm::FuncEnvironment for FuncEnvironment<'m
TableStyle::CallerChecksSignature => {
let sig_id_size = self.offsets.size_of_vmshared_signature_index();
let sig_id_type = Type::int(u16::from(sig_id_size) * 8).unwrap();
let vmctx = self.vmctx(pos.func);
let base = pos.ins().global_value(pointer_type, vmctx);
let vmctx = self.vmctx(builder.func);
let base = builder.ins().global_value(pointer_type, vmctx);
let offset =
i32::try_from(self.offsets.vmctx_vmshared_signature_id(ty_index)).unwrap();

// Load the caller ID.
let mut mem_flags = ir::MemFlags::trusted();
mem_flags.set_readonly();
let caller_sig_id = pos.ins().load(sig_id_type, mem_flags, base, offset);
let caller_sig_id = builder.ins().load(sig_id_type, mem_flags, base, offset);

// Load the callee ID.
let mem_flags = ir::MemFlags::trusted();
let callee_sig_id = pos.ins().load(
let callee_sig_id = builder.ins().load(
sig_id_type,
mem_flags,
anyfunc_ptr,
i32::from(self.offsets.vmcaller_checked_anyfunc_type_index()),
);

// Check that they match.
let cmp = pos.ins().icmp(IntCC::Equal, callee_sig_id, caller_sig_id);
pos.ins().trapz(cmp, ir::TrapCode::BadSignature);
let cmp = builder
.ins()
.icmp(IntCC::Equal, callee_sig_id, caller_sig_id);
builder.ins().trapz(cmp, ir::TrapCode::BadSignature);
}
}

let mut real_call_args = Vec::with_capacity(call_args.len() + 2);
let caller_vmctx = pos.func.special_param(ArgumentPurpose::VMContext).unwrap();
let caller_vmctx = builder
.func
.special_param(ArgumentPurpose::VMContext)
.unwrap();

// First append the callee vmctx address.
let vmctx = pos.ins().load(
let vmctx = builder.ins().load(
pointer_type,
mem_flags,
anyfunc_ptr,
Expand All @@ -1536,7 +1594,9 @@ impl<'module_environment> cranelift_wasm::FuncEnvironment for FuncEnvironment<'m
// Then append the regular call arguments.
real_call_args.extend_from_slice(call_args);

Ok(pos.ins().call_indirect(sig_ref, func_addr, &real_call_args))
Ok(builder
.ins()
.call_indirect(sig_ref, func_addr, &real_call_args))
}

fn translate_call(
Expand Down
4 changes: 4 additions & 0 deletions crates/environ/src/builtin.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,12 @@ macro_rules! foreach_builtin_function {
memory_fill(vmctx, i32, i64, i32, i64) -> ();
/// Returns an index for wasm's `memory.init` instruction.
memory_init(vmctx, i32, i32, i64, i32, i32) -> ();
/// Returns a value for wasm's `ref.func` instruction.
ref_func(vmctx, i32) -> (pointer);
/// Returns an index for wasm's `data.drop` instruction.
data_drop(vmctx, i32) -> ();
/// Returns a table entry after lazily initializing it.
table_get_lazy_init_funcref(vmctx, i32, i32) -> (pointer);
/// Returns an index for Wasm's `table.grow` instruction for `funcref`s.
table_grow_funcref(vmctx, i32, i32, pointer) -> (i32);
/// Returns an index for Wasm's `table.grow` instruction for `externref`s.
Expand Down
6 changes: 3 additions & 3 deletions crates/jit/src/instantiate.rs
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,7 @@ pub struct CompiledModule {
address_map_data: Range<usize>,
trap_data: Range<usize>,
module: Arc<Module>,
funcs: PrimaryMap<DefinedFuncIndex, FunctionInfo>,
funcs: Arc<PrimaryMap<DefinedFuncIndex, FunctionInfo>>,
trampolines: Vec<Trampoline>,
meta: Metadata,
code: Range<usize>,
Expand Down Expand Up @@ -304,7 +304,7 @@ impl CompiledModule {

let mut ret = Self {
module: Arc::new(info.module),
funcs: info.funcs,
funcs: Arc::new(info.funcs),
trampolines: info.trampolines,
wasm_data: subslice_range(section(ELF_WASM_DATA)?, code.mmap),
address_map_data: code
Expand Down Expand Up @@ -387,7 +387,7 @@ impl CompiledModule {
}

/// Returns the `FunctionInfo` map for all defined functions.
pub fn functions(&self) -> &PrimaryMap<DefinedFuncIndex, FunctionInfo> {
pub fn functions(&self) -> &Arc<PrimaryMap<DefinedFuncIndex, FunctionInfo>> {
&self.funcs
}

Expand Down
Loading

0 comments on commit 1a0ae24

Please sign in to comment.