Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESPIPE errors unless --wasi threads=y is specified in wasmtime 15 and 16 #7813

Closed
brettcannon opened this issue Jan 24, 2024 · 10 comments
Closed
Labels
bug Incorrect behavior in the current implementation that needs fixing

Comments

@brettcannon
Copy link
Contributor

brettcannon commented Jan 24, 2024

Test Case

Various CPython test cases have started failing when moving from wasmtime 13 to 16 due to ESPIPE. Adding --wasi threads=y fixes the issue which seems unrelated to file seeking.

Example failure run can be seen at https://buildbot.python.org/all/#/builders/1124/builds/640/steps/11/logs/stdio right after I switched to wasmtime 16 from 13 (wasmtime 14.0.4 does not seem to have the issue).

Steps to Reproduce

  • git clone https://github.com/python/cpython.git
  • cd cpython
  • python3 Tools/wasm/wasi.py build
  • Edit cross-build/wasm32-wasi/python.sh and remove --wasi threads=y
  • ./cross-build/wasm32-wasi/python.sh -m test test___all__

You can also try test___all__ test_bufio test_compileall test_dbm_dumb test_importlib test_io test_marshal test_pathlib test_posix test_runpy test_tarfile test_zipfile .

Expected Results

The test should pass w/o issue (it was working fine under wasmtime 13 w/ e.g. wasmtime run --env PYTHONPATH=/builddir/wasi/build/lib.wasi-wasm32-3.12:/Lib --mapdir /::/home/brettcannon/Repositories/python/cpython-3.12 -- python.wasm.

Actual Results

W/o --wasi threads=y, it causes a ESPIPE.

Versions and Environment

Wasmtime version or commit: 16.0.0 and 15.0.1

Operating system: Linux

Architecture: x64

Extra Info

WASI-SDK 20.

@brettcannon brettcannon added the bug Incorrect behavior in the current implementation that needs fixing label Jan 24, 2024
@alexcrichton
Copy link
Member

alexcrichton commented Jan 24, 2024

Adding --wasi threads=y fixes the issue which seems unrelated to file seeking.

Under the hood this toggles the -Spreview2=n flag (sorry it's confusingly named). Basically we've got two implementations of preview1 in-tree. One is based on preview2 primitives and is the current default (-Spreview2, the wasmtime_wasi::preview2 module). The other is the historical version based on preview1 (-Spreview2=n, the wasi-common crate).

What you're probably seeing here is an unintended difference in the implementation between the two and a bug we need to fix (thanks for filing!).

Would you be able to capture some extra logging to help debug a bit? Could you set WASMTIME_LOG=wasmtime_wasi for one of the failing tests, and then WASMTIME_LOG=wasi_common,wasmtime_wasi for the passing version when -Spreview2=n is passed (assuming -Spreview2=n makes it pass in the same manner as --wasi threads)

@brettcannon
Copy link
Contributor Author

Under the hood this toggles the -Spreview2=n flag (sorry it's confusingly named)

😆 definitely wouldn't have guessed that's how you force falling back to preview1.

(thanks for filing!).

Of course! You all have always been prompt and friendly when I run into issues trying to get Python working w/ WASI.

Would you be able to capture some extra logging to help debug a bit? Could you set WASMTIME_LOG=wasmtime_wasi for one of the failing tests

failure.txt

then WASMTIME_LOG=wasi_common,wasmtime_wasi for the passing version when -Spreview2=n is passed

success.txt

assuming -Spreview2=n makes it pass in the same manner as --wasi threads

It did!

I also downloaded v15.0.1 and 14.0.4 and the issue is in 15 but not 14 (I'll update the opening comment accordingly). I have downgraded the buildbot I'm using to test CPython to 14.0.4 and the results will hopefully show a pass in https://buildbot.python.org/all/#/builders/1046/builds/4082 .

@brettcannon brettcannon changed the title ESPIPE errors unless --wasi threads=y is specified ESPIPE errors unless --wasi threads=y is specified in wasmtime 15 and 16 Jan 24, 2024
@alexcrichton
Copy link
Member

oh for success.txt that older version of Wasmtime might use RUST_LOG instead of WASMTIME_LOG (the log you uploaded is an empty file). Either that or the directive isn't right, which I could also be misremembering!

Also the 14/15 split makes sense, that's when the switch in the defaults happened (release notes)

@brettcannon
Copy link
Contributor Author

oh for success.txt that older version of Wasmtime might use RUST_LOG instead of WASMTIME_LOG (the log you uploaded is an empty file

This was all under wasmtime 16; I validated separately about 14.0.4.

Either that or the directive isn't right, which I could also be misremembering!

It was a typo in the directive on my end. 😅 Sorry about that!

success.txt

@alexcrichton
Copy link
Member

Ah ok, makes sense. I'm having difficulty reproducing in that I can't get a build to work it seems. My system's stock python seems too old since it prints:

📁 /home/alex/code/cpython/cross-build/build
Traceback (most recent call last):
  File "/home/alex/code/cpython/Tools/wasm/wasi.py", line 345, in <module>
    main()
  File "/home/alex/code/cpython/Tools/wasm/wasi.py", line 341, in main
    dispatch[context.subcommand](context)
  File "/home/alex/code/cpython/Tools/wasm/wasi.py", line 265, in build_all
    step(context)
  File "/home/alex/code/cpython/Tools/wasm/wasi.py", line 78, in wrapper
    with contextlib.chdir(working_dir):
AttributeError: module 'contextlib' has no attribute 'chdir'

Using 3.11 (as opposed to my 3.10) got further however (after installing wasi-sdk-21.0):

...
/home/alex/code/wasi-sdk-21.0/bin/wasm-ld   -z stack-size=524288 -Wl,--stack-first -Wl,--initial-memory=10485760   Modules/_testimportmultiple.o   -o Modules/_testimportmultiple.cpython-313-wasm32-wasi.so
wasm-ld: error: unknown argument: -Wl,--stack-first
wasm-ld: error: unknown argument: -Wl,--initial-memory=10485760
make: *** [Makefile:3140: Modules/_testimportmultiple.cpython-313-wasm32-wasi.so] Error 1
...

I edited the configure script to remove the -Wl, prefix since the linker is invoked directly here instead of through clang but that then led to:

...
/home/alex/code/wasi-sdk-21.0/bin/wasm-ld   -z stack-size=524288 --stack-first --initial-memory=10485760   Modules/_testimportmultiple.o   -o Modules/_testimportmultiple.cpython-313-wasm32-wasi.so
wasm-ld: error: entry symbol not defined (pass --no-entry to suppress): _start
make: *** [Makefile:3140: Modules/_testimportmultiple.cpython-313-wasm32-wasi.so] Error 1
...

Do you perhaps have a CFLAGS or CC or similar configured? (or do I need a fork of wasi-sdk? or is 21.0 too new?)


I do notice though that in the failure logs I'm seeing a lot of:

...
... wasmtime_wasi::preview2::host::filesystem: unknown raw os error: Too many open files (os error 24)
...

whereas the success logs do not have that. That makes me think we're leaking something somewhere by accident. I don't think we're leaking through WASI fds since it stays around 4 for newly opened fds but my guess is that we have some Resource<T> which we're forgetting to drop or something like that which means the internal indices are filling up along with the host OS.

Could you run strace to try to see what syscall leads to the large fd numbers? Might be able to go backwards from there to what's opening and forgetting to close.

@alexcrichton
Copy link
Member

Aha it appears that this panics after 1k iterations:

fn main() {
    for i in 0.. {
        std::fs::read("foo.rs").unwrap();
        if i % 100 == 0 {
            println!("{i}");
        }
    }
}

so I think we found at least one culprit

alexcrichton added a commit to alexcrichton/wasmtime that referenced this issue Jan 24, 2024
Previously temporary streams created as part of the preview1 adapter in
the wasmtime-wasi crate were left open which meant that they continued
to occupy space in the resource table and the underlying file
accidentally wasn't ever actually closed.

cc bytecodealliance#7813
alexcrichton added a commit to alexcrichton/wasmtime that referenced this issue Jan 24, 2024
Previously temporary streams created as part of the preview1 adapter in
the wasmtime-wasi crate were left open which meant that they continued
to occupy space in the resource table and the underlying file
accidentally wasn't ever actually closed.

cc bytecodealliance#7813
@alexcrichton
Copy link
Member

Could you try #7816 and see if it resolves the issue? If not I can keep digging and see if there's still places we forgot to close things.

alexcrichton added a commit to alexcrichton/wasmtime that referenced this issue Jan 25, 2024
Previously temporary streams created as part of the preview1 adapter in
the wasmtime-wasi crate were left open which meant that they continued
to occupy space in the resource table and the underlying file
accidentally wasn't ever actually closed.

cc bytecodealliance#7813
github-merge-queue bot pushed a commit that referenced this issue Jan 25, 2024
Previously temporary streams created as part of the preview1 adapter in
the wasmtime-wasi crate were left open which meant that they continued
to occupy space in the resource table and the underlying file
accidentally wasn't ever actually closed.

cc #7813
alexcrichton added a commit that referenced this issue Jan 25, 2024
Previously temporary streams created as part of the preview1 adapter in
the wasmtime-wasi crate were left open which meant that they continued
to occupy space in the resource table and the underlying file
accidentally wasn't ever actually closed.

cc #7813
@brettcannon
Copy link
Contributor Author

I should be able to test the PR today or tomorrow (although based on how today is going, most likely tomorrow).

@alexcrichton
Copy link
Member

Wasmtime 17 is also released with the above fix which may make getting a binary easier

@brettcannon
Copy link
Contributor Author

Wasmtime 17 fixed it!

Now to file the other bugs I found. 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior in the current implementation that needs fixing
Projects
None yet
Development

No branches or pull requests

2 participants