Reimplement Wasmtime's DWARF transform and debugging support #5537

fitzgen · 2023-01-05T23:29:08Z

We should reimplement the DWARF_wasm to DWARF_native
transformation pass that implements the GDB/LLDB debugging support in Wasmtime
by separating DWARF translation from DWARF traversal. We could do this by
defining a generic DWARF transformation pass that takes a generic visitor
implementation, walks the read-only input DWARF, calls the corresponding visitor
method for each DIE/attribute/value/line-table entry/etc... in the DWARF to
produce a new DWARF entity, and writes that new DWARF entity into the output
DWARF that is being built up. We would then implement a DWARF_wasm to
DWARF_native visitor.

I think this approach would be much easier to implement, maintain, and ensure
correctness of than our current open-coded transformation.

Assuming this interface works out well and we prove it out, it could be worth
upstreaming the generic transformation pass and visitor trait into gimli
itself (cc @philipc).

Potential hiccups could be that, for our purposes here, the visitor might not be
exactly a simple map over the input DWARF (or "functor-ish") in that one
DWARF_wasm entity might become multiple DWARF_native
entities (making it more "monad-ish", apologies if I'm just muddying the waters
with this nomenclature). One example is that what might be a location list entry
in Wasm could become multiple location list entries in native code due to
register allocation, live range splitting, and spilling.

Testing

Our testing story for debugging support is very poor at the moment and the
debugging support is correspondingly buggy. As part of this reimplementation, we
should take the opportunity to improve our approach to testing.

I think we can do something like this, in a loop:

generate a random C program with C-Smith
compile the program twice:
1. to wasm32-wasi
2. to the host target
attach gdb and/or lldb to
1. wasmtime running the wasm version
2. the native binary
single step N times (or until main exits) and at each point assert that:
- the native and wasm programs are paused at the same location
- the same variables are in scope
- the variables in scope have the same values (at least for non-pointer
  scalars, we can tune the C-Smith flags we use to generate test programs as
  necessary)

I think this should give us fairly high confidence in the correctness of the new
DWARF transform.

Unfortunately, this won't fit into OSS-Fuzz's paradigm super well. It involves a
lot of wrangling external processes. I think we can do N iterations under normal
cargo test with a fixed corpus of seeds, so that running cargo test twice
runs the same set of test programs each time. And then in CI perhaps we can have
a job that runs more iterations, or a nightly CI job that does a bunch of
iterations, or something like that. To some degree, we can kick this can down
the road and figure things out once we have the test infrastructure set up (even
just running it manually whenever we touch this code would be a huge improvement
over our current debugging testing strategy).

cc @cfallin as this is something we have talked about together in the past.

The text was updated successfully, but these errors were encountered:

cfallin · 2023-01-17T19:39:32Z

Admittedly a slightly wild idea, if we wanted to try to fuzz: if the Wasmtime function-call API had a mode in which we could "single-step call" (maybe an async fn that yields at each new srcloc), and a way to introspect via DWARF the Wasm-level state at each step (so, a built-in programmatic gdb), and if we similarly instrumented an interpreter (wasmi?) to let us single-step and introspect state, then we could fuzz in-process as rapidly as we do differential execution tests today.

I wonder how we might be able to leverage existing Rust debugger libraries; I see Headcrab, and e.g. its DWARF support may be usable for this on the native-code side.

This is probably at least three months of developer time but we'd have best-in-class assurances of debuggability and the correctness of the provided program state if we had something like this, and a "debugger API" on Wasmtime could be the basis for a lot of other useful tooling too.

I'm not entirely sure what causes this but Wasmtime shouldn't panic with invalid DWARF. In general (bytecodealliance#5537) Wasmtime's support for DWARF needs to be rewritten, but in the meantime let's play whack-a-mole with panics and try to paper over issues. Closes bytecodealliance#8884 Closes bytecodealliance#8904

I'm not entirely sure what causes this but Wasmtime shouldn't panic with invalid DWARF. In general (#5537) Wasmtime's support for DWARF needs to be rewritten, but in the meantime let's play whack-a-mole with panics and try to paper over issues. Closes #8884 Closes #8904

I'm not entirely sure what causes this but Wasmtime shouldn't panic with invalid DWARF. In general (bytecodealliance#5537) Wasmtime's support for DWARF needs to be rewritten, but in the meantime let's play whack-a-mole with panics and try to paper over issues. Closes bytecodealliance#8884 Closes bytecodealliance#8904

* Add custom-pages-sizes to CLI flags (#8917) * Update dependency on `wit-bindgen` (#8911) * Update dependency on `wit-bindgen` Updating to the latest released version. * Add vets * Fix build of test-programs * Fix panic with invalid DWARF file indices (#8913) I'm not entirely sure what causes this but Wasmtime shouldn't panic with invalid DWARF. In general (#5537) Wasmtime's support for DWARF needs to be rewritten, but in the meantime let's play whack-a-mole with panics and try to paper over issues. Closes #8884 Closes #8904 * Wasmtime: Pop GC LIFO roots even when there is no GC heap (#8899) * Wasmtime: Pop GC LIFO roots even when there is no GC heap We can create and root `i31ref`s without ever allocating a GC heap for the store, so we can't guard popping LIFO roots on the presence of a GC heap or else we risk unbounded growth of the LIFO root set. * Fix build with the gc feature disabled --------- Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>

philipc · 2024-07-21T05:50:12Z

We could do this by defining a generic DWARF transformation pass that takes a generic visitor implementation, walks the read-only input DWARF, calls the corresponding visitor method for each DIE/attribute/value/line-table entry/etc... in the DWARF to produce a new DWARF entity, and writes that new DWARF entity into the output DWARF that is being built up. We would then implement a DWARFwasm to DWARFnative visitor.

I may not be understanding exactly what you mean here, but I think that a simple visitor is not enough. If code instructions can be reordered, then the line table instructions must also be reordered. Without knowing much about wasmtime and without any prior knowledge of your proposal here, this is how I would have expected this work:

read the WASM
read the DWARF line table, and convert it into a parsed form (e.g. at a similar level to what LLVM IR uses)
create links between the WASM and the parsed line table
compile the WASM to native while preserving these links
generate a completely new line table using the native code and its links (no need to use the original line table here)

How would your proposal handle this?

I think .debug_info is a lot simpler to handle, and visitor could work for it. One concern would be about location lists for variables. These may need to be handled in a similar way to the line table.

I'm interested in doing gimli work to support whatever is needed here.

alexcrichton · 2024-07-22T15:21:23Z

You're right that instructions can be reordered, and currently the way things work is that Cranelift instructions are tagged with where they came from in wasm and then the original DWARF is used to determine where that wasm offset corresponds to. How exactly this all gets transformed in DWARF I'm not 100% sure as I'm not that familiar with the code.

One thing I can possibly add though is that for variable locations we do have to translate custom DWARF things like global-relative or local-relative operands into native DWARF instead. That means that for various expressions they're rewritten and/or appended to or things like that. I think there's also some bits and bobs here and there about translating DWARF for the 32-bit wasm architecture to the host 64-bit architecture, but I think those are more minor.

fitzgen · 2024-07-22T19:43:34Z

@philipc regarding line tables, I still think that, from a user perspective, the desired interface is handing an impl Fn(Pc) -> Pc function (or visitor or whatever) to a generic line-tables rewriting function. Perhaps, as you point out, that cannot be implemented in a single pass, and might require materializing some intermediate representation that then needs to be re-sorted or whatever before emitting the actual encoded tables. I think that is fine.

Below is a sketch of what this might look like, but I'm sure I'm overlooking or missing something -- it's been a while since I was deep in DWARF stuff!

pub trait LineProgramRewriter {
    type Error: From<gimli::read::Result>;

    /// Rewrite a PC address in the line program.
    ///
    /// By default, it is unmodified.
    fn address(&mut self, address: u64) -> Result<u64, Self::Error> {
        Ok(address)
    }

    /// Rewrite a directory in the line program.
    ///
    /// By default, it is unmodified.
    fn directory(
        &mut self,
        dir: gimli::read::LineString,
    ) -> Result<gimli::read::LineString, Self::Error> {
        Ok(dir)
    }

    /// Rewrite a source line number in the line program.
    ///
    /// By default, it is unmodified.
    fn line(&mut self, line: u64) -> Result<u64, Self::Error> {
        Ok(line)
    }

    // Etc...
}

pub fn rewrite_line_program<Reader, Offset, Rewriter>(
    input: gimli::read::IncompleteLineProgram<Reader, Offset>,
    rewriter: &mut Rewriter,
) -> Result<gimli::write::LineProgram, Rewriter::Error>
where
    Rewriter: LineProgramRewriter,
{
    let mut rewritten_rows = vec![];

    let (program, sequences) = input.sequences()?;
    for seq in sequences {
        let mut rows = program.resume_from(seq);
        while let Some(row) = rows.next_row()? {
            // Apply each method of the `rewriter` to each part of the row.
            let new_row = rewrite_one_row(row, rewriter)?;
            rewritten_rows.push(new_row);
        }
    }

    // Sort the rows such that they are in an order that we can correctly
    // encode.
    rewritten_rows.sort();

    // Encode the rewritten rows into a new line program.
    let mut program = gimli::write::LineProgram::new(...);
    for row in rewritten_rows {
        // Encode each row, decide whether to start a new sequence or continue
        // the existing sequence, etc...
    }

    Ok(program)
}

I think the biggest difference from what you laid out in "this is how I would have expected this work" is that I want to factor out the wasmtime-specific bits around the DWARF_Wasm-to-DWARF_native mapping from the generic, possibly-generally-useful bits of "I have some kind of new mapping, and I want to apply it to this input DWARF to get a new rewritten DWARF".

The other thing is that, again as you mentioned, line tables are only one piece, and we would want to do this for every single section.

Is that all making sense? Does it seem reasonable?

philipc · 2024-07-23T00:31:05Z

I can understand your motivation, and it makes sense to me at a high level. The rest of this reply is just getting into the details of how it would work for line programs.

I don't think that applying a transformation to each row in the original line program is the right way to do it in general. In addition to the reordering, it would easily be possible that you need less rows, or maybe more rows (I'm not as sure about that one). So I still think that we should be generating a new line program roughly how I outlined.

In order to make the rewriting generic, we can still factor out parts of that into a trait. So the wasmtime-specific inputs that it would need at a minimum are the native address range for each sequence (one per function), and a mapping from native address to Wasm address (the reverse of what is in your LineProgramRewriter::address). The tags on the Cranelift instructions can trivially provide that.

This should probably be rewriting the range lists for inlined functions at the same time, but I haven't thought much about that.

fitzgen · 2024-07-23T16:55:34Z

In addition to the reordering, it would easily be possible that you need less rows, or maybe more rows (I'm not as sure about that one).

Yeah it actually isn't clear to me whether, in the general case, we would want to

start with the old rows and translate them into new rows using the user's mapping,
iterate over the user's mapping and reconstruct the new rows from scratch, or
some combination of the two.

I suspect there are subtleties that will impact the final debug info's fidelity involving whether the user's mapping is surjective or injective, but it isn't totally clear in my mind at this point. I guess I'm just thinking about how lossy the user's mapping is and how the direction of translation interacts with that.

I suspect that each direction is probably lossy in some way, which makes me think that we probably want to do some variant of (3) because debuggers query the DWARF in both directions: when setting breakpoints, they go from source location to set of PCs, and when intercepting an exception/fault/etc they go from PC to source location.

fitzgen · 2024-07-23T16:57:29Z

Oh also: I am more concerned about losing information than inserting duplicate rows, because I think it should be pretty easy to clean up duplicate rows during the phase where we take the IR and actually encode it as DWARF, sort of like a peephole pass.

philipc · 2024-07-27T04:53:29Z

I had a look at what LLVM BOLT does (which is a similar use case that should be supported by any generic transformation support), and it works by writing new line sequences for each function (using information from the new instructions and the original line program), rather than rewriting the original line program.

When disassembling functions, it attaches the original line row to each instruction, and then when emitting the function, it potentially emits a new line row for each instruction, which is generated using the information from the original row that is attached to that instruction.

BOLT also has the ability to copy an original line sequence for functions that it did not modify.

I had a quick look at BOLT's DIE handling, and that process is a rewrite of the original DIEs. It works by building a representation of the DIEs in memory, and mutates that.

fitzgen · 2024-07-27T16:22:14Z

That’s really informative to know, thanks for investigating that! Sounds like we should probably follow their lead.

philipc · 2024-08-15T09:42:01Z

I've been looking at the DWARF transform in Wasmtime some more. It appears that while there are some problems with the implementation details, there's nothing fundamentally wrong with the approach it is taking. In particular, at a high level the line program transformation is the same as BOLT. So I don't see the need for a complete rewrite. I think it would be better to evolve what's there already. The end result of that should still be a generic transformation pass in gimli.

As a background, my interest in this to improve the write API in gimli. Also, I have a gimli PR (gimli-rs/gimli#735) to add a ReaderAddress trait. One of the goals for that is to remove the convert_address callback in the write API. However, walrus is using that callback for its DWARF transformation, so I don't want to remove it without a solution.

Hence I'm interested in making a start on Wasmtime's .debug_line transformation. I think gimli can add transformation support for it without needing to support all of .debug_info as well. The goal of that transformation would be to support exactly what Wasmtime is doing today in a way that can be used for both Wasmtime and walrus.

Your testing ideas are interesting, but I'm not sure how much I will be able to do on that.

fitzgen added the wasmtime:debugging Issues related to debugging of JIT'ed code label Jan 5, 2023

bjorn3 mentioned this issue Feb 7, 2023

Support for Wasm Coredump #5732

Open

fitzgen mentioned this issue May 11, 2023

Read, modify, write? gimli-rs/gimli#654

Open

alexcrichton mentioned this issue May 19, 2024

Cranelift: assertion left == right failed: the memory base pointer may be incorrect due to sharing memory #8652

Closed

alexcrichton mentioned this issue Jul 8, 2024

Fix panic with invalid DWARF file indices #8913

Merged

thesuhas mentioned this issue Aug 1, 2024

Milestones before Bytecode Alliance/wasm-tools merger thesuhas/orca#126

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reimplement Wasmtime's DWARF transform and debugging support #5537

Reimplement Wasmtime's DWARF transform and debugging support #5537

fitzgen commented Jan 5, 2023

cfallin commented Jan 17, 2023

philipc commented Jul 21, 2024

alexcrichton commented Jul 22, 2024

fitzgen commented Jul 22, 2024

philipc commented Jul 23, 2024

fitzgen commented Jul 23, 2024

fitzgen commented Jul 23, 2024 •

edited

Loading

philipc commented Jul 27, 2024

fitzgen commented Jul 27, 2024

philipc commented Aug 15, 2024

Reimplement Wasmtime's DWARF transform and debugging support #5537

Reimplement Wasmtime's DWARF transform and debugging support #5537

Comments

fitzgen commented Jan 5, 2023

Testing

cfallin commented Jan 17, 2023

philipc commented Jul 21, 2024

alexcrichton commented Jul 22, 2024

fitzgen commented Jul 22, 2024

philipc commented Jul 23, 2024

fitzgen commented Jul 23, 2024

fitzgen commented Jul 23, 2024 • edited Loading

philipc commented Jul 27, 2024

fitzgen commented Jul 27, 2024

philipc commented Aug 15, 2024

fitzgen commented Jul 23, 2024 •

edited

Loading