Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STATUS_BAD_STACK on Windows on trap #1967

Closed
alexcrichton opened this issue Jul 2, 2020 · 5 comments · Fixed by #1983
Closed

STATUS_BAD_STACK on Windows on trap #1967

alexcrichton opened this issue Jul 2, 2020 · 5 comments · Fixed by #1983
Assignees
Labels
bug Incorrect behavior in the current implementation that needs fixing wasmtime Issues about wasmtime that don't fall into another label

Comments

@alexcrichton
Copy link
Member

Originally reported at bytecodealliance/wasmtime-py#36 I'm able to reproduce this pretty easily with just this wasm file. Downloading that locally I can reproduce on Windows with:

$ gunzip yosys.wasm.gz
$ cargo run --release -- run yosys.wasm -- -V
...
    Finished release [optimized] target(s) in 3.84s
     Running `target\release\wasmtime.exe run yosys.wasm -- -V`
error: process didn't exit successfully: `target\release\wasmtime.exe run yosys.wasm -- -V` (exit code: 0xc0000028)

Something funny is going on here though because --opt-level 0 does not reproduce the issue (--opt-level 1 does though).

I'm trying to run this right now through wasm-reduce from Binaryen but so far it's only shaved off 3MB, which isn't a whole lot given the size of this binary...

@peterhuene I'm having trouble just opening things up in a debugger, would you be able to help look into this?

@alexcrichton alexcrichton added the wasmtime Issues about wasmtime that don't fall into another label label Jul 2, 2020
@peterhuene
Copy link
Member

I'll take a look.

@peterhuene peterhuene self-assigned this Jul 2, 2020
@peterhuene
Copy link
Member

I've been digging into this most of today with not much to report yet.

The unwind information seems correct, as well as the stack aligning done in the prologues.

I haven't been able to form a minimal repro yet either.

The unwind is coming from the trap returned from WASI's proc_exit.

I'll keep at it.

@peterhuene
Copy link
Member

peterhuene commented Jul 3, 2020

It appears the system unwinder can't get past the frame for 12746 for some yet-to-be-determined reason.

The Wasm frames that show up in the debugger prior to the STATUS_BAD_STACK:

  • 12469 (calls wasi_snapshot_preview1.proc_exit)
  • 12504
  • 12746 (JIT'd function is 96059 bytes!)

The unwind stops there.

Prologue JIT for 12746:

0000023CCCEAB200 48 8B 01             mov         rax,qword ptr [rcx]  
0000023CCCEAB203 48 8B 00             mov         rax,qword ptr [rax]  
0000023CCCEAB206 48 81 C0 68 06 00 00 add         rax,668h  
0000023CCCEAB20D 48 39 E0             cmp         rax,rsp  
0000023CCCEAB210 72 02                jb          0000023CCCEAB214  
0000023CCCEAB212 0F 0B                ud2  
0000023CCCEAB214 40 55                push        rbp  
0000023CCCEAB216 48 89 E5             mov         rbp,rsp  
0000023CCCEAB219 40 53                push        rbx  
0000023CCCEAB21B 40 56                push        rsi  
0000023CCCEAB21D 40 57                push        rdi  
0000023CCCEAB21F 41 54                push        r12  
0000023CCCEAB221 41 55                push        r13  
0000023CCCEAB223 41 56                push        r14  
0000023CCCEAB225 41 57                push        r15  
0000023CCCEAB227 48 81 EC 68 06 00 00 sub         rsp,668h  
0000023CCCEAB22E 44 0F 11 BC 24 50 06 00 00 movups      xmmword ptr [rsp+650h],xmm15

Function 12746 has the following encoded unwind information:

01 37 0d 55 37 f8 00 00
2e 01 cd 00 27 f0 25 e0
23 d0 21 c0 1f 70 1d 60
1b 30 19 03 16 50 00 00

Decoded:

Version: 1
Prologue size: 55 bytes
Unwind codes: 13
Frame pointer register: RBP
Frame pointer offset: 80
Codes:
  - Prologue offset: 55 
    Op: SaveXmm128 (note: operation is two unwind codes in length)
    Reg: XMM15
    Stack offset: 0
  - Prologue offset:  46
    Op: LargeStackAlloc (note: operation is two unwind codes in length)
    Size: 668 bytes
  - Prologue offset: 39
    Op: PushNonvolatileRegister
    Reg: R15
  - Prologue offset: 37
    Op: PushNonvolatileRegister
    Reg: R14
  - Prologue offset: 35
    Op: PushNonvolatileRegister
    Reg: R13
  - Prologue offset: 33
    Op: PushNonvolatileRegister
    Reg: R12
  - Prologue offset: 31
    Op: PushNonvolatileRegister
    Reg: RDI
  - Prologue offset: 29
    Op: PushNonvolatileRegister
    Reg: RSI
  - Prologue offset: 27
    Op: PushNonvolatileRegister
    Reg: RBX
  - Prologue offset: 25
    Op: SetFramePointer
  - Prologue offset: 22
    Op: PushNonvolatileRegister
    Reg: RBP
Alignment padding: 2 bytes

This is correct for the generated prologue. I've verified that the correct offset to the unwind information was registered with Windows for this function.

The stack allocation for the frame is correctly aligned: 1712 bytes, starting with call-pushed return address.

Note: there appears to be a bug in the stack check code as it's not taking into account the space needed for GPR saves (FPR saves are part of the stack pointer adjustment). That bug is unrelated and would only cause a missed stack overflow trap in the unlikely case that the stack allocation size would fit, but the GPR saves push it over the limit).

Still digging.

@peterhuene
Copy link
Member

peterhuene commented Jul 3, 2020

I've determined that the unwind information does not accurately describe the prologue after all. As soon as I instruction step over establishing the frame pointer (e.g. mov rbp,rsp), stack walking fails.

The problem boils down to the way the unwind information is describing the frame pointer for frames with FPR saves. Cranelift establishes a frame pointer like one would in the x86 days: it points at the push of the previous frame pointer and thus marks the start (highest stack address) of the local frame.

However, traditional frame pointers are rarely needed on Windows x64. A "frame pointer" is really only established to mark a part of the frame as "statically-sized". Positive offsets from this "frame pointer" would actually be in this static region of the local frame and not immediately the return address and caller's frame. This is really important for functions that adjust the stack pointer in a way that can't be described in static unwind information (e.g. a function that calls alloca).

The bug is a result of my confusion regarding the Windows unwind information nomenclature of "frame pointer". What the unwind information really intends to describe is the "base pointer to the static part of the local frame". As such, Windows really is expecting an actual offset from RSP when establishing the frame pointer (e.g. lea rbp, [rsp-0x50] in this case) which we're obviously not doing.

The fix should be to stop describing a "frame pointer" in the unwind information. The frame pointer information in the unwind information will be changed to 0 and the SetFramePointer unwind code will not be emitted. The offsets for the FPR saves will be changed to positive offsets from a post-adjusted RSP.

@peterhuene
Copy link
Member

I have a fix available, but I need to add a test case where we have a trap with frames that have FPR saves so that we don't regress this.

@peterhuene peterhuene added the bug Incorrect behavior in the current implementation that needs fixing label Jul 6, 2020
peterhuene added a commit to peterhuene/wasmtime that referenced this issue Jul 6, 2020
This commit removes the "set frame pointer" unwind code and frame
pointer information from Windows x64 unwind information.

In Windows x64 unwind information, a "frame pointer" is actually the
*base address* of the static part of the local frame and would be at some
negative offset to RSP upon establishing the frame pointer.

Currently Cranelift uses a "traditional" notion of a frame pointer, one
that is the highest address in the local frame (i.e. pointing at the
previous frame pointer on the stack).

Windows x64 unwind doesn't describe such frame pointers and only needs
one described if the frame contains a dynamic stack allocation.

Fixes bytecodealliance#1967.
peterhuene added a commit to peterhuene/wasmtime that referenced this issue Jul 6, 2020
This commit removes the "set frame pointer" unwind code and frame
pointer information from Windows x64 unwind information.

In Windows x64 unwind information, a "frame pointer" is actually the
*base address* of the static part of the local frame and would be at some
negative offset to RSP upon establishing the frame pointer.

Currently Cranelift uses a "traditional" notion of a frame pointer, one
that is the highest address in the local frame (i.e. pointing at the
previous frame pointer on the stack).

Windows x64 unwind doesn't describe such frame pointers and only needs
one described if the frame contains a dynamic stack allocation.

Fixes bytecodealliance#1967.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior in the current implementation that needs fixing wasmtime Issues about wasmtime that don't fall into another label
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants