Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cranelift CLIF Fuzzer generate blocks and branches #3094

Merged
merged 3 commits into from
Sep 3, 2021

Conversation

afonso360
Copy link
Contributor

@afonso360 afonso360 commented Jul 18, 2021

Hey,

Continuing the implementation of the CLIF Fuzzer (tracked in #3050), this PR implements block generation and generating basic branch instructions (jump, brnz, brz, br_icmp).

In this implementation we generate all blocks and signatures up front, and then while generating instructions pick random target blocks to jump to.

This PR is based on top of #3062 which should be merged soon, but I think we can get some issues sorted out while waiting for that to be merged.

@afonso360
Copy link
Contributor Author

afonso360 commented Jul 18, 2021

This currently quickly crashes in the verify fuzzer with a out of memory error.

This crash is reproducible with:

 echo "JyEEAf8B/ydxAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACcAAAAAQgAAAAAAAAAAAAAAvARAAAIAAAClAAAnAAA=" | base64 -d > fuzz-input
 cargo fuzz run cranelift-fuzzgen-verify ./fuzz-input

Here's what the FunctionBuilder looks like before calling seal_all_blocks (which is where this crashes):

function u0:0(i32, i64, i8, b1, i8) system_v {
block0(v0: i32, v1: i64, v2: i8, v3: b1, v4: i8):
    br_icmp eq v2, v2, block1(v3, v3, v3, v3, v3, v3, v3, v3, v3, v3, v3)
    jump block1(v3, v3, v3, v3, v3, v3, v3, v3, v3, v3, v3)

block1(v5: b1, v6: b1, v7: b1, v8: b1, v9: b1, v10: b1, v11: b1, v12: b1, v13: b1, v14: b1, v15: b1, v16: i8):
    br_icmp eq v16, v16, block1(v15, v15, v15, v15, v15, v15, v15, v15, v15, v15, v15)
    jump block1(v15, v15, v15, v15, v15, v15, v15, v15, v15, v15, v15)

block2(v17: i8):
    br_icmp eq v17, v17, block5
    jump block2

block3(v18: b1, v19: i8):
    br_icmp eq v19, v19, block1(v18, v18, v18, v18, v18, v18, v18, v18, v18, v18, v18)
    jump block1(v18, v18, v18, v18, v18, v18, v18, v18, v18, v18, v18)

block4(v20: b1, v21: i8):
    br_icmp ugt v21, v21, block1(v20, v20, v20, v20, v20, v20, v20, v20, v20, v20, v20)
    jump block3

block5(v22: b1, v23: i8, v24: i8):
    br_icmp ule v23, v24, block1(v22, v22, v22, v22, v22, v22, v22, v22, v22, v22, v22)
    jump block1(v22, v22, v22, v22, v22, v22, v22, v22, v22, v22, v22)
}

This input doesn't look unreasonable to me. None of the blocks have a return, but do we need to guarantee a return is present?

I just noticed now that 2 of the jump's provide no args to their target blocks, but we definitely pass them in in the instruction inserter. Is this because we picked a non local variable?

Here is the output of cargo-fuzz:

     Running `/mnt/c/Users/Afonso/CLionProjects/wasmtime/target/x86_64-unknown-linux-gnu/release/cranelift-fuzzgen-verify -artifact_prefix=/mnt/c/Users/Afonso/CLionProjects/wasmtime/fuzz/artifacts/cranelift-fuzzgen-verify/ ar
tifacts/cranelift-fuzzgen-verify/oom-a8aef5b7b1e8d020aae51e6d3dedc9418b04dd99`
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 1365534722
INFO: Loaded 1 modules   (221787 inline 8-bit counters): 221787 [0x56195415718a, 0x56195418d3e5),
INFO: Loaded 1 PC tables (221787 PCs): 221787 [0x56195418d3e8,0x5619544ef998),
/mnt/c/Users/Afonso/CLionProjects/wasmtime/target/x86_64-unknown-linux-gnu/release/cranelift-fuzzgen-verify: Running 1 inputs 1 time(s) each.
Running: artifacts/cranelift-fuzzgen-verify/oom-a8aef5b7b1e8d020aae51e6d3dedc9418b04dd99
==2822== ERROR: libFuzzer: out-of-memory (used: 2093Mb; limit: 2048Mb)
   To change the out-of-memory limit use -rss_limit_mb=<N>

Live Heap Allocations: 2037467893 bytes in 54 chunks; quarantined: 32 bytes in 1 chunks; 8736 other chunks; total chunks: 8791; showing top 95% (at most 8 unique contexts)
2013265920 byte(s) (98%) in 1 allocation(s)
    #0 0x5619523a36a9 in realloc /rustc/llvm/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    #1 0x56195247a6e4 in alloc::raw_vec::finish_grow::h45ef09ff8efa3011 (/mnt/c/Users/Afonso/CLionProjects/wasmtime/target/x86_64-unknown-linux-gnu/release/cranelift-fuzzgen-verify+0x8aa6e4)
    #2 0x5619522ca86f in alloc::raw_vec::RawVec$LT$T$C$A$GT$::reserve::do_reserve_and_handle::h10ad7ae98da66681 (/mnt/c/Users/Afonso/CLionProjects/wasmtime/target/x86_64-unknown-linux-gnu/release/cranelift-fuzzgen-verify+0x6f
a86f)
    #3 0x5619524aae90 in cranelift_frontend::ssa::SSABuilder::use_var_nonlocal::hfb5d4209dbff3efc (/mnt/c/Users/Afonso/CLionProjects/wasmtime/target/x86_64-unknown-linux-gnu/release/cranelift-fuzzgen-verify+0x8dae90)
    #4 0x5619524b0cba in cranelift_frontend::ssa::SSABuilder::run_state_machine::h4be05d0025ad2762 (/mnt/c/Users/Afonso/CLionProjects/wasmtime/target/x86_64-unknown-linux-gnu/release/cranelift-fuzzgen-verify+0x8e0cba)
    #5 0x5619524ac4dd in cranelift_frontend::ssa::SSABuilder::seal_one_block::h0168f5ef0239babb (/mnt/c/Users/Afonso/CLionProjects/wasmtime/target/x86_64-unknown-linux-gnu/release/cranelift-fuzzgen-verify+0x8dc4dd)
    #6 0x56195249e736 in cranelift_frontend::frontend::FunctionBuilder::seal_all_blocks::h80281c0fe53cdd5a (/mnt/c/Users/Afonso/CLionProjects/wasmtime/target/x86_64-unknown-linux-gnu/release/cranelift-fuzzgen-verify+0x8ce736)
    #7 0x56195244447c in cranelift_fuzzgen::function_generator::FunctionGenerator::generate::h960858ee4bfd5fd6 (/mnt/c/Users/Afonso/CLionProjects/wasmtime/target/x86_64-unknown-linux-gnu/release/cranelift-fuzzgen-verify+0x874
47c)
    #8 0x5619524451be in _$LT$cranelift_fuzzgen..TestCase$u20$as$u20$arbitrary..Arbitrary$GT$::arbitrary::h8ffee5eb12788765 (/mnt/c/Users/Afonso/CLionProjects/wasmtime/target/x86_64-unknown-linux-gnu/release/cranelift-fuzzgen
-verify+0x8751be)
    #9 0x5619523ec5c7 in rust_fuzzer_test_input (/mnt/c/Users/Afonso/CLionProjects/wasmtime/target/x86_64-unknown-linux-gnu/release/cranelift-fuzzgen-verify+0x81c5c7)
    #10 0x561953b286d0 in __rust_try (/mnt/c/Users/Afonso/CLionProjects/wasmtime/target/x86_64-unknown-linux-gnu/release/cranelift-fuzzgen-verify+0x1f586d0)
    #11 0x561953b27e5f in LLVMFuzzerTestOneInput (/mnt/c/Users/Afonso/CLionProjects/wasmtime/target/x86_64-unknown-linux-gnu/release/cranelift-fuzzgen-verify+0x1f57e5f)
    #12 0x561953b2b5fc in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/mnt/c/Users/Afonso/CLionProjects/wasmtime/target/x86_64-unknown-linux-gnu/release/cranelift-fuzzgen-verify+0x1f5b5fc)
    #13 0x561953b3ba29 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) (/mnt/c/Users/Afonso/CLionProjects/wasmtime/target/x86_64-unknown-linux-gnu/release/cranelift-fuzzgen-verify+0x1f6ba29)
    #14 0x561953b45222 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/mnt/c/Users/Afonso/CLionProjects/wasmtime/target/x86_64-unknown-linux-gnu/release/cranelift-fuzzgen-verify+0x1f7522
2)
    #15 0x561952327786 in main (/mnt/c/Users/Afonso/CLionProjects/wasmtime/target/x86_64-unknown-linux-gnu/release/cranelift-fuzzgen-verify+0x757786)
    #16 0x7f7187be50b2 in __libc_start_main /build/glibc-eX1tMB/glibc-2.31/csu/../csu/libc-start.c:308:16

SUMMARY: libFuzzer: out-of-memory
────────────────────────────────────────────────────────────────────────────────

Error: Fuzz target exited with exit status: 71

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. fuzzing Issues related to our fuzzing infrastructure labels Jul 18, 2021
@github-actions
Copy link

Subscribe to Label Action

cc @fitzgen

This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "fuzzing"

Thus the following users have been cc'd because of the following labels:

  • fitzgen: fuzzing

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@cfallin
Copy link
Member

cfallin commented Jul 26, 2021

but do we need to guarantee a return is present?

Nope, a function that does not terminate (loops infinitely) is legal CLIF, though of course that should hit the timeout mechanism you've added to the interpreter.

Live Heap Allocations: 2037467893 bytes in 54 chunks; quarantined: 32 bytes in 1 chunks; 8736 other chunks; total chunks: 8791; showing top 95% (at most 8 unique contexts)

This looks like uncontrolled Vec growth in the SSA-builder frontend; either the predecessor-block mechanism or the variable tracking. Possibly the way in which blocks are sealed is incorrect, though I haven't looked closely at the details here yet. Are you able to dig a bit more and e.g. add some tracing logging to the algorithm to see what's going wrong?

@afonso360
Copy link
Contributor Author

afonso360 commented Aug 1, 2021

Hey,

Had a busy week and couldn't really look at this until today. I think this is what is going on, but I tried to build a minimized test demonstrating this behaviour and couldn't reproduce it.

It looks like there are 2 blocks that are important here block5 and block2.

The infinite loop happens after calling begin_predecessors_lookup for block5, which from my understanding is trying to look for a definition for variable 3 in block2, but we don't use varable 3 in block2.

Block: block2 - signature: []
use_var_nonlocal: var: Variable(2)	ty: types::I8	block: block2
	use_var_nonlocal: case: Unsealed(v17)

Block: block5 - signature: []
use_var_nonlocal: var: Variable(3)	ty: types::B1	block: block5
	use_var_nonlocal: case: Unsealed(v22)
use_var_nonlocal: var: Variable(2)	ty: types::I8	block: block5
	use_var_nonlocal: case: Unsealed(v23)
use_var_nonlocal: var: Variable(4)	ty: types::I8	block: block5
	use_var_nonlocal: case: Unsealed(v24)

...

begin_predecessors_lookup: sentinel: v22 - dest_block: block5
	begin_predecessors_lookup: calls: [FinishPredecessorsLookup(v22, block5), UseVar(block2)]

run_state_machine: processing call: UseVar(block2) for var: Variable(3)
	run_state_machine: call queue: [FinishPredecessorsLookup(v22, block5)]
	var_defs: var: Variable(3) - ssa_block: block2 - blockindex: 2 - var_defs - SecondaryMap { elems: [Some(v3), Some(v15), None, Some(v18), Some(v20), Some(v22)], default: None, unused: PhantomData } - var_defs[ssa_block]: None
use_var_nonlocal: var: Variable(3)	ty: types::B1	block: block2
	use_var_nonlocal: case: SealedOnePredecessor(block2)

run_state_machine: processing call: UseVar(block2) for var: Variable(3)
	run_state_machine: call queue: [FinishPredecessorsLookup(v22, block5), FinishSealedOnePredecessor(block2)]
	var_defs: var: Variable(3) - ssa_block: block2 - blockindex: 2 - var_defs - SecondaryMap { elems: [Some(v3), Some(v15), None, Some(v18), Some(v20), Some(v22)], default: None, unused: PhantomData } - var_defs[ssa_block]: None
use_var_nonlocal: var: Variable(3)	ty: types::B1	block: block2
	use_var_nonlocal: case: SealedOnePredecessor(block2)

This then repeats forever increasing the FinishSealedOnePredecessor(block2) cases in the call queue, which is what causes the uncontrollable vec growth.

I've pushed the log calls into a separate commit.

Edit: Had some further progress here.

We are indeed looking for variable 3 in block2 which should be ok, but that's not what is causing the issue. The issue is happening because block2 only has one predecessor which is block2, so we keep recursing through its predecessors causing the infinite loop.

Looking at the code, block2 is indeed the only predecessor to block2, and I assume that we want to mark a block as its own predecessor if it jumps to itself.

So, should we stop the lookup in use_var_nonlocal if we are trying to lookup ourselves again? (seems reasonable to me)

This also brings up another issue, which is: block2 and block5 are unreachable from block0, we never actually jump to either of them, so we are never going to find the definition of variable 3 in their predecessors.

If I understand the paper correctly, they deal with this in Algorithm 3 by DCE'ing these blocks?

@cfallin
Copy link
Member

cfallin commented Aug 6, 2021

Looking at the code, block2 is indeed the only predecessor to block2, and I assume that we want to mark a block as its own predecessor if it jumps to itself.

So, should we stop the lookup in use_var_nonlocal if we are trying to lookup ourselves again? (seems reasonable to me)

Ah -- yes, we need to handle the self-predecessor case specially here, I think. This likely never came up before because the Wasm translator will not generate code of that form.

This also brings up another issue, which is: block2 and block5 are unreachable from block0, we never actually jump to either of them, so we are never going to find the definition of variable 3 in their predecessors.

If I understand the paper correctly, they deal with this in Algorithm 3 by DCE'ing these blocks?

Yes, it looks like they replace any phi whose only input is itself with undef; this makes sense as such a phi could only occur in an unreachable infinite loop (otherwise the loop entry would have to feed into the phi too).

Unreachable code can create all sorts of interesting edge cases in analyses. As one design point, in the regalloc2 fuzzer I took care to only generate reachable blocks in CFGs by construction, and require that for all input code; but I don't think it makes as much sense to create such a mandate in a general-purpose compiler, as it imposes too much on the IR producer. The upshot is that fuzzing like this will certainly expose our unreachable-code-related bugs, so thank you :-)

afonso360 added a commit to afonso360/wasmtime that referenced this pull request Aug 27, 2021
afonso360 added a commit to afonso360/wasmtime that referenced this pull request Aug 27, 2021
afonso360 added a commit to afonso360/wasmtime that referenced this pull request Aug 27, 2021
afonso360 added a commit to afonso360/wasmtime that referenced this pull request Sep 1, 2021
Perform a search over block predecessors trying to find loops of
unreachable predecessors. We do this by iterating on predecessors and
marking them as visited, stopping if we find a previously visited block
or if we find a block with multiple predecessors.

This issue was found by the CLIF fuzzer in bytecodealliance#3094.
@afonso360
Copy link
Contributor Author

afonso360 commented Sep 3, 2021

@cfallin I can't request a review on GH, but I think with #3258 merged we can start a review on this.

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks for the work in tracking down the SSA-builder issues to enable this, too. I'm interested to see what sorts of new fuzzbugs this might find...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend. cranelift Issues related to the Cranelift code generator fuzzing Issues related to our fuzzing infrastructure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants