Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement lazy funcref table and anyfunc initialization. #3733

Merged
merged 1 commit into from
Feb 9, 2022

Commits on Feb 9, 2022

  1. Implement lazy funcref table and anyfunc initialization.

    During instance initialization, we build two sorts of arrays eagerly:
    
    - We create an "anyfunc" (a `VMCallerCheckedAnyfunc`) for every function
      in an instance.
    
    - We initialize every element of a funcref table with an initializer to
      a pointer to one of these anyfuncs.
    
    Most instances will not touch (via call_indirect or table.get) all
    funcref table elements. And most anyfuncs will never be referenced,
    because most functions are never placed in tables or used with
    `ref.func`. Thus, both of these initialization tasks are quite wasteful.
    Profiling shows that a significant fraction of the remaining
    instance-initialization time after our other recent optimizations is
    going into these two tasks.
    
    This PR implements two basic ideas:
    
    - The anyfunc array can be lazily initialized as long as we retain the
      information needed to do so. For now, in this PR, we just recreate the
      anyfunc whenever a pointer is taken to it, because doing so is fast
      enough; in the future we could keep some state to know whether the
      anyfunc has been written yet and skip this work if redundant.
    
      This technique allows us to leave the anyfunc array as uninitialized
      memory, which can be a significant savings. Filling it with
      initialized anyfuncs is very expensive, but even zeroing it is
      expensive: e.g. in a large module, it can be >500KB.
    
    - A funcref table can be lazily initialized as long as we retain a link
      to its corresponding instance and function index for each element. A
      zero in a table element means "uninitialized", and a slowpath does the
      initialization.
    
    Funcref tables are a little tricky because funcrefs can be null. We need
    to distinguish "element was initially non-null, but user stored explicit
    null later" from "element never touched" (ie the lazy init should not
    blow away an explicitly stored null). We solve this by stealing the LSB
    from every funcref (anyfunc pointer): when the LSB is set, the funcref
    is initialized and we don't hit the lazy-init slowpath. We insert the
    bit on storing to the table and mask it off after loading.
    
    We do have to set up a precomputed array of `FuncIndex`s for the table
    in order for this to work. We do this as part of the module compilation.
    
    This PR also refactors the way that the runtime crate gains access to
    information computed during module compilation.
    
    Performance effect measured with in-tree benches/instantiation.rs, using
    SpiderMonkey built for WASI, and with memfd enabled:
    
    ```
    BEFORE:
    
    sequential/default/spidermonkey.wasm
                            time:   [68.569 us 68.696 us 68.856 us]
    sequential/pooling/spidermonkey.wasm
                            time:   [69.406 us 69.435 us 69.465 us]
    
    parallel/default/spidermonkey.wasm: with 1 background thread
                            time:   [69.444 us 69.470 us 69.497 us]
    parallel/default/spidermonkey.wasm: with 16 background threads
                            time:   [183.72 us 184.31 us 184.89 us]
    parallel/pooling/spidermonkey.wasm: with 1 background thread
                            time:   [69.018 us 69.070 us 69.136 us]
    parallel/pooling/spidermonkey.wasm: with 16 background threads
                            time:   [326.81 us 337.32 us 347.01 us]
    
    WITH THIS PR:
    
    sequential/default/spidermonkey.wasm
                            time:   [6.7821 us 6.8096 us 6.8397 us]
                            change: [-90.245% -90.193% -90.142%] (p = 0.00 < 0.05)
                            Performance has improved.
    sequential/pooling/spidermonkey.wasm
                            time:   [3.0410 us 3.0558 us 3.0724 us]
                            change: [-95.566% -95.552% -95.537%] (p = 0.00 < 0.05)
                            Performance has improved.
    
    parallel/default/spidermonkey.wasm: with 1 background thread
                            time:   [7.2643 us 7.2689 us 7.2735 us]
                            change: [-89.541% -89.533% -89.525%] (p = 0.00 < 0.05)
                            Performance has improved.
    parallel/default/spidermonkey.wasm: with 16 background threads
                            time:   [147.36 us 148.99 us 150.74 us]
                            change: [-18.997% -18.081% -17.285%] (p = 0.00 < 0.05)
                            Performance has improved.
    parallel/pooling/spidermonkey.wasm: with 1 background thread
                            time:   [3.1009 us 3.1021 us 3.1033 us]
                            change: [-95.517% -95.511% -95.506%] (p = 0.00 < 0.05)
                            Performance has improved.
    parallel/pooling/spidermonkey.wasm: with 16 background threads
                            time:   [49.449 us 50.475 us 51.540 us]
                            change: [-85.423% -84.964% -84.465%] (p = 0.00 < 0.05)
                            Performance has improved.
    ```
    
    So an improvement of something like 80-95% for a very large module (7420
    functions in its one funcref table, 31928 functions total).
    cfallin committed Feb 9, 2022
    Configuration menu
    Copy the full SHA
    c841cbe View commit details
    Browse the repository at this point in the history