Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Wasm Atomics for Cranelift/newBE/aarch64. #2077

Merged
merged 1 commit into from
Aug 4, 2020

Commits on Aug 4, 2020

  1. Implement Wasm Atomics for Cranelift/newBE/aarch64.

    The implementation is pretty straightforward.  Wasm atomic instructions fall
    into 5 groups
    
    * atomic read-modify-write
    * atomic compare-and-swap
    * atomic loads
    * atomic stores
    * fences
    
    and the implementation mirrors that structure, at both the CLIF and AArch64
    levels.
    
    At the CLIF level, there are five new instructions, one for each group.  Some
    comments about these:
    
    * for those that take addresses (all except fences), the address is contained
      entirely in a single `Value`; there is no offset field as there is with
      normal loads and stores.  Wasm atomics require alignment checks, and
      removing the offset makes implementation of those checks a bit simpler.
    
    * atomic loads and stores get their own instructions, rather than reusing the
      existing load and store instructions, for two reasons:
    
      - per above comment, makes alignment checking simpler
    
      - reuse of existing loads and stores would require extension of `MemFlags`
        to indicate atomicity, which sounds semantically unclean.  For example,
        then *any* instruction carrying `MemFlags` could be marked as atomic, even
        in cases where it is meaningless or ambiguous.
    
    * I tried to specify, in comments, the behaviour of these instructions as
      tightly as I could.  Unfortunately there is no way (per my limited CLIF
      knowledge) to enforce the constraint that they may only be used on I8, I16,
      I32 and I64 types, and in particular not on floating point or vector types.
    
    The translation from Wasm to CLIF, in `code_translator.rs` is unremarkable.
    
    At the AArch64 level, there are also five new instructions, one for each
    group.  All of them except `::Fence` contain multiple real machine
    instructions.  Atomic r-m-w and atomic c-a-s are emitted as the usual
    load-linked store-conditional loops, guarded at both ends by memory fences.
    Atomic loads and stores are emitted as a load preceded by a fence, and a store
    followed by a fence, respectively.  The amount of fencing may be overkill, but
    it reflects exactly what the SM Wasm baseline compiler for AArch64 does.
    
    One reason to implement r-m-w and c-a-s as a single insn which is expanded
    only at emission time is that we must be very careful what instructions we
    allow in between the load-linked and store-conditional.  In particular, we
    cannot allow *any* extra memory transactions in there, since -- particularly
    on low-end hardware -- that might cause the transaction to fail, hence
    deadlocking the generated code.  That implies that we can't present the LL/SC
    loop to the register allocator as its constituent instructions, since it might
    insert spills anywhere.  Hence we must present it as a single indivisible
    unit, as we do here.  It also has the benefit of reducing the total amount of
    work the RA has to do.
    
    The only other notable feature of the r-m-w and c-a-s translations into
    AArch64 code, is that they both need a scratch register internally.  Rather
    than faking one up by claiming, in `get_regs` that it modifies an extra
    scratch register, and having to have a dummy initialisation of it, these new
    instructions (`::LLSC` and `::CAS`) simply use fixed registers in the range
    x24-x28.  We rely on the RA's ability to coalesce V<-->R copies to make the
    cost of the resulting extra copies zero or almost zero.  x24-x28 are chosen so
    as to be call-clobbered, hence their use is less likely to interfere with long
    live ranges that span calls.
    
    One subtlety regarding the use of completely fixed input and output registers
    is that we must be careful how the surrounding copy from/to of the arg/result
    registers is done.  In particular, it is not safe to simply emit copies in
    some arbitrary order if one of the arg registers is a real reg.  For that
    reason, the arguments are first moved into virtual regs if they are not
    already there, using a new method `<LowerCtx for Lower>::ensure_in_vreg`.
    Again, we rely on coalescing to turn them into no-ops in the common case.
    
    There is also a ridealong fix for the AArch64 lowering case for
    `Opcode::Trapif | Opcode::Trapff`, which removes a bug in which two trap insns
    in a row were generated.
    
    In the patch as submitted there are 6 "FIXME JRS" comments, which mark things
    which I believe to be correct, but for which I would appreciate a second
    opinion.  Unless otherwise directed, I will remove them for the final commit
    but leave the associated code/comments unchanged.
    julian-seward1 committed Aug 4, 2020
    Configuration menu
    Copy the full SHA
    23a3953 View commit details
    Browse the repository at this point in the history