Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors when parsing DWARF from Wasm objects #265

Closed
cfallin opened this issue Feb 11, 2023 · 20 comments · Fixed by #267
Closed

Errors when parsing DWARF from Wasm objects #265

cfallin opened this issue Feb 11, 2023 · 20 comments · Fixed by #267

Comments

@cfallin
Copy link

cfallin commented Feb 11, 2023

I'm attempting to use addr2line to get line-number info from the DWARF embedded in a wasm32 module (produced by wasi-sdk 19's clang from a large C++ program). I've incorporated a fairly straightforward integration into my Wasm-parsing frontend that gathers all appropriate .debug_* sections, puts them in a gimli::Dwarf, and tries to build an addr2line::Context from that. No parsing or data-munging at all is happening on my side.

This seems to generate valid line-number tuples for a C hello-world program, but a large C++ program (the SpiderMonkey JS engine compiled to Wasm) built with the same clang results in

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: UnexpectedEof(ReaderOffsetId(4310779619))', src/ir/debug.rs:59:57

That offset is definitely out-of-range (the input file is 277MiB).

At the top level, first, is addr2line known or expected to work with the DWARF that is embedded in wasm32? If so, I'd be happy to try to get the above module to you for reduction / debugging!

@philipc
Copy link
Contributor

philipc commented Feb 11, 2023

To be clear, the unwrap causing the panic is not addr2line. It's expected that addr2line should work with any DWARF, so I'm interested in a reproduction case.

@cfallin cfallin changed the title Panics when parsing DWARF from Wasm objects Errors when parsing DWARF from Wasm objects Feb 11, 2023
@cfallin
Copy link
Author

cfallin commented Feb 11, 2023

Right, sorry, my code is panic'ing on an Err returned from addr2line. I've updated the title, sorry about that!

To repro, you can build the addr2line branch of my waffle project (here), then with fail.wasm:

$ target/debug/waffle-util print-ir fail.wasm
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: UnexpectedEof(ReaderOffsetId(4373366499))', src/ir/debug.rs:59:57

and that should hit the panic on the unexpected-EOF error from inside addr2line/gimli.

(It's 60MiB gzipped; any others, please be nice to my S3 bandwidth and download only if needed!)

Thanks for looking into this!

@philipc
Copy link
Contributor

philipc commented Feb 12, 2023

How is fail.wasm created? A preliminary look indicates it is missing a .debug_addr section.

@philipc
Copy link
Contributor

philipc commented Feb 12, 2023

Okay fail.wasm looks fine (gimli's dwarfdump reads it okay, and so does addr2line with object/wasm feature).

The problem is that waffle isn't loading the .debug_loc and .debug_ranges sections. Fixing that is a bit awkward to do because gimli::LocationLists and gimli::RangeLists only have private members, so there's no way to load only one section at a time.

@cfallin
Copy link
Author

cfallin commented Feb 12, 2023

The problem is that waffle isn't loading the .debug_loc and .debug_ranges sections. Fixing that is a bit awkward to do because gimli::LocationLists and gimli::RangeLists only have private members, so there's no way to load only one section at a time.

Ah, fantastic, thank you for finding that and sorry for burning your time with such a basic error!

Followup question, but I'm happy to close this issue and/or open a new one if you'd prefer: it seems the returned line-number info now has a bunch of ranges, all starting at PC 0; does each range correspond to one function, in order, or...? (I naively would have expected either offset-in-file or offset-in-code-section) From the same fail.wasm I now get (the lines are {base:x} - {len:x} and the u32::MAX's are another weird bit here):

tuples:
 ffffffff - f: loc0
 e - e: loc1
 1c - 1: loc2
 1d - d: loc3
 2a - 10: loc1
 3a - 2: loc4
 3c - 4: loc5
 40 - e: loc4
 4e - e: loc6
 5c - 12: loc4
 ffffffff - f: loc0
 e - e: loc1
 1c - 1: loc2
 1d - d: loc3
 2a - 10: loc1

@mstange
Copy link
Contributor

mstange commented Feb 12, 2023

We've seen trouble in the past with line programs starting at zero for wasm, in rust-lang/rust#79410

@cfallin
Copy link
Author

cfallin commented Feb 13, 2023

Interesting, thanks! FWIW, this Wasm module was produced by clang 15.0.7, as packaged in wasi-sdk 19, building a large C++ program; but maybe it's an underlying LLVM issue of some sort?

@philipc
Copy link
Contributor

philipc commented Feb 13, 2023

These ffffffff entries look like tombstones. llvm-dwarfdump doesn't display them but gimli's dwarfdump does. We probably need a gimli patch to skip sequences that start with them.

@philipc
Copy link
Contributor

philipc commented Feb 13, 2023

This should be fixed by gimli-rs/gimli#642

@cfallin
Copy link
Author

cfallin commented Feb 13, 2023

With an override to use that PR's branch for gimli, I can confirm that I'm getting sensible line-number tuples out of the Wasm module above -- thanks a bunch!

@cfallin
Copy link
Author

cfallin commented Feb 14, 2023

Well, I maybe spoke prematurely: I'm getting tuples that look superficially reasonable, but the offsets seem nonsensical in context -- sourcelocs that are from unrelated files and clearly can't be right. I suspect I'm still not understanding something about how Wasm and DWARF interact, or am misinterpreting an offset somehow. Anyone else with experience in this please feel free to point me toward resources that could help!

@philipc
Copy link
Contributor

philipc commented Feb 14, 2023

Can you give an example? I'm happy to look into it.

@cfallin
Copy link
Author

cfallin commented Feb 14, 2023

Sure -- latest main on cfallin/waffle above will now print sourcelocs in the IR dump with waffle-util print-ir. Run that on fail.wasm and, at least if one has been staring at this IR for the past week (sigh), things won't seem right. E.g., func879 is InterpretInner; I would expect at least some srclocs to be from Interpreter.cpp (file1052), where that function is defined, but none are. Somehow things seem offset?

@philipc
Copy link
Contributor

philipc commented Feb 14, 2023

I'm not seeing the locs for func879 in the waffle-util print-ir output:

  func879 "InterpretInner(JSContext*, js::RunState&, InterpretContext&, unsigned char*, bool)": sig6 = # i32, i32, i32, i32, i32 -> i32
  # raw bytes (length 52153)

Looking at the dwarf with llvm-dwarfdump, the function in .debug_info is:

0x005ee960:   DW_TAG_subprogram
                DW_AT_low_pc    (0x0005c7c2)
                DW_AT_high_pc   (0x0006937b)
                DW_AT_frame_base        (DW_OP_WASM_location 0x0 0x5, DW_OP_stack_value)
                DW_AT_GNU_all_call_sites        (true)
                DW_AT_linkage_name      ("_ZL14InterpretInnerP9JSContextRN2js8RunStateER16InterpretContextPhb")
                DW_AT_name      ("InterpretInner")
                DW_AT_decl_file ("/Users/cfallin/work/gecko-dev/js/src/vm/Interpreter.cpp")
                DW_AT_decl_line (2059)
                DW_AT_type      (0x0051675b "bool")

We can find that address in .debug_line:

0x000000000005c7c2   2062      0      6   0             0  is_stmt
0x000000000005c7ec    530     36    185   0             0  is_stmt prologue_end
0x000000000005c7ef      0     36    185   0             0 
0x000000000005c7f3   2213      8      6   0             0  is_stmt
0x000000000005c7f6     89     14    343   0             0  is_stmt
0x000000000005c800      0     14    343   0             0 
0x000000000005c80f    739     41     35   0             0  is_stmt
0x000000000005c815   2213     19      6   0             0  is_stmt
0x000000000005c81d   2213      7      6   0             0 
0x000000000005c81f      0      7      6   0             0 
0x000000000005c82a    739     41     35   0             0  is_stmt
0x000000000005c83b      0     41     35   0             0 
0x000000000005c83d    111     14     35   0             0  is_stmt
0x000000000005c845      0     14     35   0             0 
0x000000000005c847    132     18     35   0             0  is_stmt
0x000000000005c84a    569      7    392   0             0  is_stmt
0x000000000005c852      0      7    392   0             0 

where file 6 is:

file_names[  6]:
           name: "js/src/vm/Interpreter.cpp"
      dir_index: 2
       mod_time: 0x00000000
         length: 0x00000000

If I run addr2line on it:

$ ../addr2line/target/debug/examples/addr2line -e fail.wasm 
5c7c2
/Users/cfallin/work/gecko-dev/js/src/vm/Interpreter.cpp:2062
5c7f3
/Users/cfallin/work/gecko-dev/js/src/vm/Interpreter.cpp:2213

So it looks like the DWARF contains the correct info and addr2line can find it.

@cfallin
Copy link
Author

cfallin commented Feb 15, 2023

Ah, sorry -- I had been passing optimization options to waffle-util that expanded function bodies (they are lazily parsed); latest main now unconditionally expands when running print-ir. I'm currently seeing no IR indicating it's from Interpreter.cpp despite plenty of code that should be.

I suspect I'm misinterpreting offsets somehow; I'll continue to experiment with this...

@philipc
Copy link
Contributor

philipc commented Feb 15, 2023

I patched it to display the DWARF address for each loc too, and they all have low values. So I would guess you're looking up the locations using some sort of relative offset, instead of the address.

@cfallin
Copy link
Author

cfallin commented Feb 15, 2023

I've resolved this successfully now -- thank you so much for your patience! (For anyone stumbling on this now or later: (i) wasmparser's offsets are always absolute wrt the start-of-file, and (ii) separately, there can be more than one code-section in a Wasm module, so trying to save the offset of "the" code section to compensate for supposedly relative offsets will certainly not work.)

Is it likely there will be new releases of gimli and addr2line soon to pick up the PR from above? I suppose this issue can be closed once a patched gimli is used on main here...

@philipc
Copy link
Contributor

philipc commented Feb 15, 2023

Did you push those changes? It still doesn't look right to me.

I'll do a patch release of gimli, so you'll be able to pick up that change without needing an addr2line release (I want to do some more changes before another release of it).

@cfallin
Copy link
Author

cfallin commented Feb 15, 2023

I have, I think; waffle commit c999f995b. It passes my smell-test at least, to the point that I see sourcelocs in interpreter-related sources in InterpreterInner(). I'll test further with some smaller inputs as well. I'm curious, what issues or discrepancies are you seeing?

@philipc
Copy link
Contributor

philipc commented Feb 15, 2023

See cfallin/waffle#1

abrown added a commit to abrown/wasmtime that referenced this issue Oct 11, 2023
When troubleshooting deadlocks in WebAssembly modules, it is important
to understand which `wait` instructions are still pending a `notify`.
It would be nice to have some kind of `--warn-deadlock-after=1s` flag
available that would poll the parking lot for `wait`s hanging past the
time limit, but I realized the real value would be to tie the `wait`
instruction (through CLIF) to the original source code, if debug
information were available. This did not seem to be entirely feasible,
since CLIF loses the original Wasm source context (is this true?) and I
was not confident that we would be able to use `addr2line` to map from
Wasm instructions to source (e.g., see @cfallin's
[issue](gimli-rs/addr2line#265)).

Instead, this change simply logs each valid `wait` and `notify`
execution, leaving it to the user to figure out which one is hanging
(should not be too difficult) and how to map this back to their source
code (more difficult).
github-merge-queue bot pushed a commit to bytecodealliance/wasmtime that referenced this issue Oct 11, 2023
When troubleshooting deadlocks in WebAssembly modules, it is important
to understand which `wait` instructions are still pending a `notify`.
It would be nice to have some kind of `--warn-deadlock-after=1s` flag
available that would poll the parking lot for `wait`s hanging past the
time limit, but I realized the real value would be to tie the `wait`
instruction (through CLIF) to the original source code, if debug
information were available. This did not seem to be entirely feasible,
since CLIF loses the original Wasm source context (is this true?) and I
was not confident that we would be able to use `addr2line` to map from
Wasm instructions to source (e.g., see @cfallin's
[issue](gimli-rs/addr2line#265)).

Instead, this change simply logs each valid `wait` and `notify`
execution, leaving it to the user to figure out which one is hanging
(should not be too difficult) and how to map this back to their source
code (more difficult).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants