Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wasmtime loops forever when wasm32-wasi-threads guests write to stdout #7745

Closed
dicej opened this issue Jan 3, 2024 · 4 comments · Fixed by #7750
Closed

Wasmtime loops forever when wasm32-wasi-threads guests write to stdout #7745

dicej opened this issue Jan 3, 2024 · 4 comments · Fixed by #7750
Labels
bug Incorrect behavior in the current implementation that needs fixing

Comments

@dicej
Copy link
Contributor

dicej commented Jan 3, 2024

UPDATE: The original issue was due to not specifying -pthread when building. Now the problem is that Wasmtime loops indefinitely when a wasm32-wasi-threads guest writes to stdout. See #7745 (comment) for steps to reproduce.

Original report follows:

Per WebAssembly/wasi-libc#456 (comment), a number of wasi-sdk and wasi-libc tests are currently failing when run using Wasmtime but run successfully on other runtimes. They seem to be related to thread-local storage (e.g. tests involving errno, which is a thread-local variable).

Test Case

This file was built from https://github.com/WebAssembly/wasi-sdk/blob/main/tests/general/signals.c for wasm32-wasi-threads using wasi-sdk:

signals.c.-O0.wasm.tar.gz

Steps to Reproduce

First, install any recent version of Wasmtime (e.g. v16.0.0). Then:

tar xf signals.c.-O0.wasm.tar.gz
wasmtime run signals.c.-O0.wasm

Expected Results

The expected output is:

psignal message for SIGINT: Interrupt

Actual Results

The actual output is:

Assertion failed: raise(_NSIG) == -1 && errno == EINVAL (: main: 31)
Error: failed to run main module `signals.c.-O0.wasm`

Caused by:
    0: failed to invoke command default
    1: error while executing at wasm backtrace:
           0: 0x1072 - <unknown>!abort
           1: 0x1179 - <unknown>!__assert_fail
           2:  0x716 - <unknown>!__original_main
           3:  0x1da - <unknown>!_start
    2: wasm trap: wasm `unreachable` instruction executed

Versions and Environment

Wasmtime version or commit: v16.0.0

Operating system: MacOS (but should repro on any OS)

Architecture: ARM64 (but should repro on any architecture)

Extra Info

See also WebAssembly/wasi-threads#49, which may be related.

@dicej dicej added the bug Incorrect behavior in the current implementation that needs fixing label Jan 3, 2024
@alexcrichton
Copy link
Member

What are the other runtimes this passes in? If I run this in node it produces the same error as Wasmtime using an adapted script from this issue

runner.js
const ITERS = 1;

////////////////////////////////////////////////////////////////////////////////

class TextEncoder {
  constructor(enc) {
    if (enc != "utf-8") {
      throw new Error("FITZGEN: unsupported encoding: " + enc);
    }
  }
  encode(s, n) {
    let buf = new Uint8Array(s.length);
    for (let i = 0; i < Math.min(s.length, n); i++) {
      buf[i] = s.charCodeAt(i); // lol
    }
    return buf;
  }
};

class TextDecoder {
  constructor(enc) {
    if (enc != "utf-8") {
      throw new Error("FITZGEN: unsupported encoding: " + enc);
    }
  }
  decode(buf) {
    let buf8 = new Uint8Array(buf);
    let s = "";
    for (let i = 0; i < buf8.length; i++) {
      s += String.fromCharCode(buf8[i]); // lol
    }
    return s;
  }
};

////////////////////////////////////////////////////////////////////////////////

async function ionCompile(wasm) {
  let module = await WebAssembly.compile(wasm);
  // while (!wasmHasTier2CompilationCompleted(module)) {
  //   sleep(1);
  // }
  // wasmDis(module);
  return module;
}

function unimplemented(name) {
  return function (...args) {
    throw new Error(name + " is unimplemented! args = " + args);
  }
}

class ProcExitError extends Error {
  constructor(code) {
    this.code = code;
    this.message = "Program exited with code " + code;
  }
  toString() {
    return "ProcExitError: " + this.message
  }
}

function trace(obj) {
  let proxy = {};
  for (let key of Object.keys(obj)) {
    if (typeof obj[key] != "function") {
      proxy[key] = obj[key];
      continue;
    }
    proxy[key] = function (...args) {
      console.log("TRACE: " + key + "(" + args + ")");
      let ret = obj[key](...args);
      console.log("TRACE:   -> " + ret);
      return ret;
    };
  }
  return proxy;
}

////////////////////////////////////////////////////////////////////////////////

const fs = require('fs');

async function main() {
  let smWasm = fs.readFileSync(process.argv[2]);
  let smModule = await ionCompile(smWasm);

  for (let i = 0; i < ITERS; i++) {
    let args = process.argv;
    let mem = null;
    let start = null;
    let unread = true;

    let smInstance = await WebAssembly.instantiate(smModule, trace({
      env: {
        memory: new WebAssembly.Memory({
          // shared: true,
          initial: 8000,
          maximum: 16000,
        }),
      },
      bench: {
        start: function () {
          start = monotonicNow()
        },
        end: function () {
          let end = monotonicNow();
          console.log("ITER: " + (end - start));
        }
      },
      wasi: {
        'thread-spawn': function() {
          throw new Error('thread-spawn');
        },
      },
      wasi_snapshot_preview1: {
        fd_pread: function() { throw new Error('fd_pread'); },
        poll_oneoff: function(in_, out, n) {
          console.log(n);
          // throw new Error('poll_oneoff');
          return 0;
        },
        random_get: function(a, b) {
          while (b > 0) {
            (new DataView(mem.buffer)).setInt8(a, 1, true);
            b -= 1;
            a += 1;
          }
          return 0;
        },
        args_get: function(a, b) {
          for (let arg of args) {
            (new DataView(mem.buffer)).setInt32(a, b, true);
            a += 4;
            for (let c in arg) {
              (new DataView(mem.buffer)).setInt8(b, arg.charCodeAt(c), true);
              b += 1;
            }
            (new DataView(mem.buffer)).setInt8(b, 0, true);
            b += 1;
          }
          return 0;
        },
        args_sizes_get: function(a, b) {
          let len = 0;
          for (let arg of args) {
            len += arg.length + 1;
          }
          (new DataView(mem.buffer)).setInt32(a, args.length, true);
          (new DataView(mem.buffer)).setInt32(b, len, true);
          return 0;
        },
        clock_res_get: function () { return 1; },
        clock_time_get: function(a, b, c) {
          const now = Math.round(performance.now() * 1000000);
          (new DataView(mem.buffer)).setBigInt64(c, BigInt(now), true);
          return 0;
        },
        fd_filestat_get: function() { throw new Error('fd_filestat_get'); },
        fd_read: function(fd, iovecs_ptr, iovecs_len, out_ptr) {
          let mem8 = new Uint8Array(mem.buffer);
          switch (fd) {
          case 4:
            let data = os.file.readFile("/home/nick/sightglass/benchmarks/spidermonkey/default.input.md");
            let k = 0;
            for (let i = 0; i < iovecs_len && k < data.length && unread; i++) {
              let ptr = (new DataView(mem.buffer)).getUint32(iovecs_ptr + i * 8, true);
              let len = (new DataView(mem.buffer)).getUint32(iovecs_ptr + i * 8 + 4, true);
              for (let j = 0; j < len && k < data.length; j++) {
                mem8[ptr + j] = data.charCodeAt(k++);
              }
            }
            unread = false;
            (new DataView(mem.buffer)).setUint32(out_ptr, k, true);
            return 0;
          default:
            return 8;
          }
        },
        fd_seek: function(fd, offset, whence, out_ptr) {
          switch (fd) {
          case 4:
            let len = os.file.readFile("/home/nick/sightglass/benchmarks/spidermonkey/default.input.md").length;
            (new DataView(mem.buffer)).setBigUint64(out_ptr, BigInt(len), true);
            return 0;
          default:
            return 8;
          }
        },
        fd_write: function(a, b, c, d) {
          let s = '';
          let total = 0;
          while (c > 0) {
            let base = (new DataView(mem.buffer)).getInt32(b, true);
            let len = (new DataView(mem.buffer)).getInt32(b + 4, true);
            b += 8;
            c -= 1;

            while (len > 0) {
              let c = new Uint8Array(mem.buffer)[base]
              s += String.fromCharCode(c);
              len -= 1;
              base += 1;
              total += 1;
            }
          }
          //console.log("fd_write(" + a + "): " + s.trimEnd());
          console.log(s.trimEnd());
          (new DataView(mem.buffer)).setInt32(d, total, true);
          return 0;
        },
        fd_fdstat_set_flags: function() { throw new Error('fd_fdstat_set_flags'); },
        path_filestat_get: function() { throw new Error('path_filestat_get'); },
        path_open: function(fd, dirflags, path_ptr, path_len, oflags, fs_rights_base, fs_rights_inheriting, fdflags, out_ptr) {
          let buf = new Uint8Array(path_len);
          let mem8 = new Uint8Array(mem.buffer);
          for (let i = 0; i < path_len; i++) {
            buf[i] = mem8[path_ptr + i];
          }
          let path = (new TextDecoder('utf-8')).decode(buf);
          switch (path) {
          case "default.input.md":
            (new DataView(mem.buffer)).setInt32(out_ptr, 4, true);
            return 0;
          default:
            console.log('denying path_open(' + path + ')');
            return 2;
          }
        },
        path_remove_directory: function() { throw new Error('path_remove_directory'); },
        path_unlink_file: function() { throw new Error('path_unlink_file'); },
        sched_yield: function() { throw new Error('sched_yield'); },
        environ_get: function() { return 0; },
        environ_sizes_get: function(a, b) {
          (new DataView(mem.buffer)).setInt32(a, 0, true);
          (new DataView(mem.buffer)).setInt32(b, 0, true);
          return 0;
        },
        fd_close: function(fd) {
          return 0;
        },
        fd_fdstat_get: function(fd, out_ptr) {
          (new DataView(mem.buffer)).setInt32(out_ptr, 0);
          // type
          switch (fd) {
          case 3:
            (new DataView(mem.buffer)).setInt32(out_ptr + 4, 3, true);
            break;
          case 4:
            (new DataView(mem.buffer)).setInt32(out_ptr + 4, 4, true);
            break;
          default:
            (new DataView(mem.buffer)).setInt32(out_ptr + 4, 0, true);
            break;
          }
          (new DataView(mem.buffer)).setInt32(out_ptr + 8, 0, true);
          (new DataView(mem.buffer)).setInt32(out_ptr + 12, 0, true);
          (new DataView(mem.buffer)).setInt32(out_ptr + 16, 0, true);
          (new DataView(mem.buffer)).setInt32(out_ptr + 20, 0, true);
          return 0;
        },
        fd_prestat_get: function(a, b) {
          // case for preopened "."
          if (a == 3) {
            // discriminant for directory
            (new DataView(mem.buffer)).setInt32(b, 0, true);
            // one entry in directory
            (new DataView(mem.buffer)).setInt32(b + 4, 1, true);
            return 0;
          }
          return 8;
        },
        fd_prestat_dir_name: function(fd, ptr, len) {
          let buf = (new TextEncoder('utf-8')).encode(".", 1);
          let mem8 = new Uint8Array(mem.buffer);
          for (let i = 0; i < Math.min(buf.length, len); i++) {
            mem8[ptr + i] = buf[i];
          }
          mem8[Math.min(buf.length, len)] = 0;
          return 0;
        },
        proc_exit: function(code) { throw new ProcExitError(code); },
      },
    }));

    mem = smInstance.exports.memory;

    try {
      smInstance.exports._start();
    } catch (e) {
      if ((e instanceof ProcExitError) && e.code == 0) {
        continue;
      }
      throw e;
    }
  }
}

main().catch(function (e) {
  console.log("====== ERROR!!! ======");
  console.log(e);
  console.log(e.stack);
});

If the only other runtime is WAMR (from the issue thread you linked then WAMR has been known in the past for non-spec-compliant behavior. If there are other runtimes though then the mystery will deepen.

@dicej
Copy link
Contributor Author

dicej commented Jan 4, 2024

Okay, I dug a bit deeper into this. Turns out all the wasi-libc tests are passing with Wasmtime 16, Toywasm 35, and Wasmer 4.2.5 as long as I build with -pthread -- if I don't build with that flag, they all fail. Lesson learned.

Meanwhile, I've also updated the wasi-sdk test script to build with -pthread. Now Toywasm and Wasmer are able to run the tests okay (modulo the fact that the .expected files are hard-coded to match Wasmtime, and the other runtimes format error output differently), but Wasmtime ends up busy looping forever as soon as any test tried to write to stdout, e.g.:

* thread #1, name = 'main', queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x000000018c9a6060 libsystem_kernel.dylib`kevent + 8
    frame #1: 0x0000000100a5d284 wasmtime`mio::waker::Waker::wake::h70f0de287550c690 + 64
    frame #2: 0x0000000100a4cb94 wasmtime`tokio::util::wake::wake_arc_raw::h1a32ae31a8ec5072 + 68
    frame #3: 0x0000000100a4c0f4 wasmtime`tokio::runtime::scheduler::current_thread::Context::park_yield::hc41080da61d04b53 + 320
    frame #4: 0x00000001003da4a8 wasmtime`tokio::runtime::context::scoped::Scoped$LT$T$GT$::set::hf092eaeb69914454 + 820
    frame #5: 0x00000001005a69b0 wasmtime`tokio::runtime::context::set_scheduler::h48cc529b1f87605e + 172
    frame #6: 0x00000001003dbb50 wasmtime`tokio::runtime::scheduler::current_thread::CoreGuard::block_on::h0aea4e87e0591d94 + 112
    frame #7: 0x00000001005b33bc wasmtime`tokio::runtime::context::runtime::enter_runtime::h2aa886ffe04dfc58 + 668
    frame #8: 0x00000001003ad284 wasmtime`wasmtime_wasi::preview2::in_tokio::hb195ebe39d79dde3 + 420
    frame #9: 0x00000001004a7bf0 wasmtime`_$LT$core..panic..unwind_safe..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once::h04ccc3778290beca + 180
    frame #10: 0x00000001003ef1f0 wasmtime`wasmtime_runtime::instance::Instance::from_vmctx::hac10297a5cfbc8d9 + 148
    frame #11: 0x00000001004209a0 wasmtime`_$LT$F$u20$as$u20$wasmtime..func..IntoFunc$LT$T$C$$LP$wasmtime..func..Caller$LT$T$GT$$C$A1$C$A2$C$A3$C$A4$RP$$C$R$GT$$GT$::into_func::native_call_shim::h383012752bf42248 + 84
    frame #12: 0x00000001025c954c
    frame #13: 0x00000001025c7030
    frame #14: 0x00000001025c8650
    frame #15: 0x00000001025c882c
    frame #16: 0x00000001025c8a24
    frame #17: 0x00000001025c8300
    frame #18: 0x00000001025c8440
    frame #19: 0x00000001025c436c
    frame #20: 0x00000001025c6df0
    frame #21: 0x00000001025c42a0
    frame #22: 0x00000001025c91e8
    frame #23: 0x0000000100ede5e0 wasmtime`wasmtime_setjmp_16_0_0 + 100
    frame #24: 0x000000010065e638 wasmtime`wasmtime_runtime::traphandlers::_$LT$impl$u20$wasmtime_runtime..traphandlers..call_thread_state..CallThreadState$GT$::with::ha4d3eccbd3d85625 + 64
    frame #25: 0x00000001004f7530 wasmtime`wasmtime_runtime::traphandlers::catch_traps::hd22f75804c9e7d97 + 152
    frame #26: 0x000000010044c0a8 wasmtime`wasmtime::func::invoke_wasm_and_catch_traps::h92e67288af28d454 + 184
    frame #27: 0x000000010044e2ac wasmtime`wasmtime::func::Func::call_impl::h172a5645f5d101de + 1304
    frame #28: 0x0000000100495d10 wasmtime`wasmtime_cli::commands::run::RunCommand::load_main_module::h5baeac87cd95a968 + 2596
    frame #29: 0x0000000100494190 wasmtime`wasmtime_cli::commands::run::RunCommand::execute::hba50b58e459b0ce7 + 3056
    frame #30: 0x0000000100150474 wasmtime`wasmtime::Wasmtime::execute::h2578284636f5c3af + 296
    frame #31: 0x000000010014df80 wasmtime`wasmtime::old_cli::main::h4ab441c89575015e + 580
    frame #32: 0x0000000100155530 wasmtime`std::sys_common::backtrace::__rust_begin_short_backtrace::h90399811d0439507 + 12
    frame #33: 0x0000000100153ea0 wasmtime`std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h5450f370f5c6d820 + 24
    frame #34: 0x000000010107f17c wasmtime`std::rt::lang_start_internal::haeade8bb0e866fa6 [inlined] core::ops::function::impls::_$LT$impl$u20$core..ops..function..FnOnce$LT$A$GT$$u20$for$u20$$RF$F$GT$::call_once::hdb9d57f6813b6639 at function.rs:284:13 [opt]
    frame #35: 0x000000010107f174 wasmtime`std::rt::lang_start_internal::haeade8bb0e866fa6 [inlined] std::panicking::try::do_call::h3316be3da4e712e4 at panicking.rs:504:40 [opt]
    frame #36: 0x000000010107f174 wasmtime`std::rt::lang_start_internal::haeade8bb0e866fa6 [inlined] std::panicking::try::h72745351e21d17fd at panicking.rs:468:19 [opt]
    frame #37: 0x000000010107f174 wasmtime`std::rt::lang_start_internal::haeade8bb0e866fa6 [inlined] std::panic::catch_unwind::h2df3aeb4d14c657e at panic.rs:142:14 [opt]
    frame #38: 0x000000010107f174 wasmtime`std::rt::lang_start_internal::haeade8bb0e866fa6 [inlined] std::rt::lang_start_internal::_$u7b$$u7b$closure$u7d$$u7d$::h40cc69f0b809216f at rt.rs:148:48 [opt]
    frame #39: 0x000000010107f174 wasmtime`std::rt::lang_start_internal::haeade8bb0e866fa6 [inlined] std::panicking::try::do_call::hdf604766e4bed556 at panicking.rs:504:40 [opt]
    frame #40: 0x000000010107f170 wasmtime`std::rt::lang_start_internal::haeade8bb0e866fa6 [inlined] std::panicking::try::h52de1aaf80258115 at panicking.rs:468:19 [opt]
    frame #41: 0x000000010107f170 wasmtime`std::rt::lang_start_internal::haeade8bb0e866fa6 [inlined] std::panic::catch_unwind::hec154f66f8285e2b at panic.rs:142:14 [opt]
    frame #42: 0x000000010107f170 wasmtime`std::rt::lang_start_internal::haeade8bb0e866fa6 at rt.rs:148:20 [opt]
    frame #43: 0x0000000100151898 wasmtime`main + 52
    frame #44: 0x000000018c687f28 dyld`start + 2236

The stack trace is a bit different each time I take a sample, but it's generally spinning in tokio or mio.

@abrown
Copy link
Contributor

abrown commented Jan 4, 2024

Maybe a wait is not being notify-ed? I added #7220 to help diagnose this kind of thing: does running with WASMTIME_LOG=wasmtime_runtime=trace help clarify things?

@dicej
Copy link
Contributor Author

dicej commented Jan 4, 2024

 $ WASMTIME_LOG=wasmtime_runtime=trace  wasmtime argc_argv_main.c.-O0.wasm
2024-01-04T18:06:50.745040Z TRACE wasmtime_runtime::memory: memory.atomic.notify(addr=0xb58, count=4294967295)

Then it hangs with no further output.

For reference:

// argc_argv_main.c
#include <stdio.h>

int main(int argc, char *argv[]) {
    puts("hello from argc argv main!");
    return 0;
}
/opt/wasi-sdk/bin/clang --sysroot=/opt/wasi-sdk/share/wasi-sysroot --target=wasm32-wasi-threads -pthread -O0 argc_argv_main.c -o argc_argv_main.c.-O0.wasm

@dicej dicej changed the title Thread-local storage not working correctly for wasm32-wasi-threads guests Wasmtime loops forever when wasm32-wasi-threads guests write to stdout Jan 4, 2024
dicej added a commit to dicej/wasmtime that referenced this issue Jan 4, 2024
Previously, `first_non_empty_{c}io_vec` always returned `Ok(None)` for buffers
residing in shared memories since they cannot, in general, safely be represented
as slices.  That caused e.g. `wasi-libc` to spin forever when trying to write to
stdout using `fd_write` since it always got `Ok(0)` and never made progress.

This commit changes the return type of both functions to use `GuestPtr` instead
of `GuestSlice{Mut}`, allowing safe access to shared guest memory.

Big thanks to Alex Crichton for narrowing this down and suggesting the fix.

Fixes bytecodealliance#7745

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
github-merge-queue bot pushed a commit that referenced this issue Jan 5, 2024
Previously, `first_non_empty_{c}io_vec` always returned `Ok(None)` for buffers
residing in shared memories since they cannot, in general, safely be represented
as slices.  That caused e.g. `wasi-libc` to spin forever when trying to write to
stdout using `fd_write` since it always got `Ok(0)` and never made progress.

This commit changes the return type of both functions to use `GuestPtr` instead
of `GuestSlice{Mut}`, allowing safe access to shared guest memory.

Big thanks to Alex Crichton for narrowing this down and suggesting the fix.

Fixes #7745

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior in the current implementation that needs fixing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants