Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Added cargo-fuzz tests for Rlp crate #6145

Closed
wants to merge 19 commits into from
Closed
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions util/rlp/fuzz/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@

target
corpus
artifacts
36 changes: 36 additions & 0 deletions util/rlp/fuzz/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@

[package]
name = "rlp-fuzz"
version = "0.0.1"
authors = ["Parity Technologies <admin@parity.io>"]
publish = false
build = "build.rs"

[package.metadata]
cargo-fuzz = true

[features]
nightly = []

[dependencies]
rand = "0.3"

[build-dependencies]
rustc_version = "0.2.1"

[dependencies.rlp]
path = ".."
[dependencies.libfuzzer-sys]
git = "https://github.com/rust-fuzz/libfuzzer-sys.git"

# Prevent this from interfering with workspaces
[workspace]
members = ["."]

[[bin]]
name = "untrusted"
path = "fuzz_targets/untrusted.rs"

[[bin]]
name = "append_raw"
path = "fuzz_targets/append_raw.rs"
38 changes: 38 additions & 0 deletions util/rlp/fuzz/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#Fuzzing crates with cargo-fuzz

##Installation
Install cargo-fuzz using cargo:
- `cargo install cargo-fuzz`

##Initializing fuzz tests
Change into the target crate's base directory, e.g.:
- `cd parity/util/rlp`

Initialize the fuzz directory:
- `cargo fuzz init`
- `cargo fuzz init -t named_first_test`

Fuzz tests stored under `./fuzz/fuzz_targets/*.rs`

##Adding tests
Add a test named `anyname` under `fuzz/fuzz_targets/anyname.rs`:
- `cargo fuzz add anyname`

This also adds an entry to `fuzz/Cargo.toml` for creating a binary target
with the test's name, e.g.
```
[[bin]]
name = "anyname"
path = "fuzz_targets/anyname.rs"
```

##Editing tests
Make changes and import any necessary libraries to test target features
- `vim ./fuzz/fuzz_targets/untrusted.rs`

##Running tests
While still in the crate's root directory, run the following:
- `cargo fuzz run <test-name>`
- `cargo fuzz run -j <num-jobs> <test-name>` for parallell jobs

Examples and detailed information on libfuzzer output can be found [here](https://rust-fuzz.github.io/book/cargo-fuzz/tutorial.html)
19 changes: 19 additions & 0 deletions util/rlp/fuzz/build.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
extern crate rustc_version;
use rustc_version::{version, version_meta, Channel, Version};

fn main() {
assert!(version().unwrap().major >= 1);

match version_meta().unwrap().channel {
Channel::Stable => {
}
Channel::Beta => {
}
Channel::Nightly => {
println!("cargo:rustc-cfg=feature=\"nightly\"")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of all the RUSTC_IS_*? can't you just if let Channel::Nightly = version_meta().unwrap() { /* set feature flag */ }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, I took this code from an example on using rustc_version (not super familiar with conditional compilation).

I'll remove the code from the other cases, and only set the feature flag for nightly

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if let Channel::Nighlty = version_meta().unwrap().channel {
  println!("cargo:rustc-cfg=feature=\"nightly\"")
}

}
Channel::Dev => {
}
}

}
19 changes: 19 additions & 0 deletions util/rlp/fuzz/fuzz_targets/append_raw.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#![no_main]
#![cfg(feature = "nightly")]
#[macro_use] extern crate libfuzzer_sys;
extern crate rlp;
extern crate rand;

use rlp::RlpStream;
use rand::Rng;

fuzz_target!(|data: &[u8]| {
let mut rls = RlpStream::new();

// Generate random item count to further fuzz append_raw's parsing
let mut rng = rand::thread_rng();
let rand_item_count = rng.gen_range::<usize>(1 as usize, 4096 as usize);

let _ = rls.append_raw(data, rand_item_count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

append_raw doesn't do any checking of the data passed AFAIK. but it does check that item_count is within the number of remaining items in the list (which in this case, would be 1).

what do you expect to find by fuzz-testing this? RlpStream is typically not exposed to external input, whereas UntrustedRlp is.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying to get full fuzz-coverage of the classes in the rlp crate, and I hadn't considered where the data may be coming from.

I agree it makes more sense to fully fuzz the UntrustedRlp struct, and am focusing more on that now that the foundational stuff is fixed.

Also still becoming more familiar with the code base, another expectation/goal of writing fuzz tests for all the code.

If for some reason or another, a developer decides to use a normally internal function/struct, it would be beneficial to know if that feature is buggy.

});

11 changes: 11 additions & 0 deletions util/rlp/fuzz/fuzz_targets/untrusted.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#![no_main]
#![cfg(feature = "nightly")]
#[macro_use] extern crate libfuzzer_sys;
extern crate rlp;

use rlp::UntrustedRlp;

fuzz_target!(|data: &[u8]| {
let _ = UntrustedRlp::new(data);
Copy link
Contributor

@rphmeier rphmeier Jul 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

creating UntrustedRlp just wraps the data given. probably this fuzz test should try and iterate over all lists recursively. just note that finding invalid RLP will be expected: we want to try and find uncaught errors.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, what do you mean by all the lists?

Copy link
Contributor

@rphmeier rphmeier Jul 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RLP is a data format consisting mostly of nested lists, so you could try iterating over all of them recursively. This might give the fuzzer more to work with in terms of exploring code paths

Copy link
Author

@onicslabs onicslabs Jul 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added another fuzz test that creates a list (prepending 0xc + data_len to the fuzz data).

probably this fuzz test should try and iterate over all lists recursively.

RLP is a data format consisting mostly of nested lists, so you could try iterating over all of them recursively.

So now I'm thinking of creating lists of various types from the fuzz data, and recursively calling UntrustedRlp::new(fuzz-list-data) on those lists.

What about taking variable length slices from the raw fuzz data, then using those as list items, prepended by their length?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UntrustedRlp has an iter() method yielding an iterator over other UntrustedRlp elements within it. You could write a function

fn iter_recursive(rlp: UntrustedRlp) {
    for x in rlp.iter() {
        iter_recursive(x)
    }
}

this would essentially walk the whole RLP structure recursively

Copy link
Author

@onicslabs onicslabs Jul 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sounds like it requires building a minimally well-formed list in the RLP structure. is this what you are talking about building from the fuzz data?

i don't fully understand your thinking on this. i get the code coverage, just trying to figure out guiding the fuzzer without constraining it too much.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after reading through parity/util/rlp/src/impls.rs, think i get what you were talking about with recursive lists. Make a nested list, where each item is a nested list of each type and subtype en/decoded by the UntrustedRLP structure.

rlp-root
|- hash-list
 -|-hash64
 -|-hash128
 -|-hash160
 -|-etc

Do you think throwing fuzz data into these list items will get the right code/crash coverage (using your recursive function to traverse the RLP structure)?

Copy link
Contributor

@rphmeier rphmeier Jul 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the interesting thing about RLP is that actually, each item is either one of a few things:

  • A list of items, prefixed by its length
  • Special byte for empty data (treated as empty list)
  • A list of bytes, prefixed by its length

This is why it's called "recursive length prefix".
Take a look at the spec for more details.
All of our strongly-typed decoding is implemented on top of those primitives.

What I'm suggesting is that we have the fuzzer try and generate data which we treat as a bunch of nested lists of bytes (and try to walk accordingly). This will allow the fuzzer to attempt to explore almost all code paths in the decoder, possibly finding some well-formed RLP that leads to panic. Most of the data it generates will end up being invalid RLP, but it may find some valid RLPs which are not handled correctly.

Isn't this the usual strategy with fuzz testing? e.g. if you have a PNG decoder, you try to have the fuzzer generate data that breaks the decoder, not to make the fuzzer generate a picture of a horse.

Copy link
Author

@onicslabs onicslabs Jul 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm suggesting is that we have the fuzzer try and generate data which we treat as a bunch of nested lists of bytes (and try to walk accordingly)

Currently, I'm using the gen_untrusted_corpus.rs tool to build some basic, well-structured RLP streams. I'm still working on generating all the data types UntrustedRlp can normally handle, and then I'll build some RLP streams with mostly good, but a little messed up data. Hopefully, this will give the fuzzer more to mutate over, and we'll get deeper coverage.

Thanks for the link to the spec, I'll go over it again to get more ideas for what to throw at this thing. I'm understanding better how RLP really works now. Hopefully this leads to some really exhaustive fuzz code.

Isn't this the usual strategy with fuzz testing? e.g. if you have a PNG decoder, you try to have the fuzzer generate data that breaks the decoder, not to make the fuzzer generate a picture of a horse.

Absolutely agree. I'm trying to add more to the fuzzer corpus, so that the fuzzer's mutation algorithms can be more efficient in breaking the RLP decoder. Like sending in a PNG of a horse with a trojan encoded inside :)

});