Added cargo-fuzz tests for Rlp crate #6145

onicslabs · 2017-07-25T14:58:30Z

Set up some fuzz test harnesses for the UntrustedRlp struct
and RlpStream's append_raw function.

This change is

Set up some fuzz test harnesses for the UntrustedRlp struct and RlpStream's append_raw function.

parity-cla-bot · 2017-07-25T14:58:31Z

It looks like @onicslabs hasn'signed our Contributor License Agreement, yet.

The purpose of a CLA is to ensure that the guardian of a project's outputs has the necessary ownership or grants of rights over all contributions to allow them to distribute under the chosen licence.
Wikipedia

You can read and sign our full Contributor License Agreement at the following URL: https://cla.parity.io

Once you've signed, plesae reply to this thread with [clabot:check] to prove it.

Many thanks,

Parity Technologies CLA Bot

rphmeier · 2017-07-25T15:00:14Z

util/rlp/fuzz/Cargo.toml

+[package]
+name = "rlp-fuzz"
+version = "0.0.1"
+authors = ["Automatically generated"]


can you set this to Parity Technologies <admin@parity.io>?

onicslabs · 2017-07-25T15:07:46Z

[clabot:check]

parity-cla-bot · 2017-07-25T15:07:47Z

It looks like @onicslabs signed our Contributor License Agreement. 👍

Many thanks,

Parity Technologies CLA Bot

rphmeier · 2017-07-25T15:10:10Z

A README.md in the rlp-fuzz crate explaining how to run it (unless it's just cargo test -p rlp-fuzz?) would be great too. Do you recommend us to run this as part of the CI, or our nightly tests?

onicslabs · 2017-07-25T15:11:33Z

This addition won't pass under a stable build, since cargo-fuzz requires nightly to compile correctly. Should I remove this pull request?

Is it possible to exclude fuzz directories from stable compilation, then check fuzz directories compile correctly under a nightly build?

onicslabs · 2017-07-25T15:14:11Z

I'd be happy to do a small readme.md

I'll include a basic cargo-fuzz installation guide, and run through steps for the entire fuzz testing cycle.

rphmeier · 2017-07-25T15:14:58Z

@onicslabs You could replace the crate with a stub when a cfg(feature = "nightly") is not true. You can set this feature in a build script that checks using the rustc_version crate.

Only attempt to compile fuzz tests using a nightly compiler

onicslabs · 2017-07-25T16:15:20Z

Just added conditional compilation for the two fuzzing tests I wrote. @rphmeier are these changes similar to what you were talking about?

Long-term, cargo-fuzz should integrate a feature/option to automatically add conditional compilation.

May start working on this, and see what the cargo-fuzz community thinks about adding a feature like that.

rphmeier · 2017-07-25T16:33:41Z

util/rlp/fuzz/fuzz_targets/append_raw.rs

@@ -0,0 +1,18 @@
+#![no_main]
+#[macro_use] extern crate libfuzzer_sys;
+#[cfg(feature = "nightly"])


shouldn't it be #![cfg(feature = "nightly")] to be applied to the whole source file? to me this looks like it will only import the rlp crate on nightly, but still try to do everything else on stable.
same for untrusted.rs

Wrap the entire source files for fuzz tests in the nightly feature. Meaning tests will only be included in compilation when using a nightly compiler.

Include entire source in config macro

Re-checked macro syntax, and changed to match correct syntax.

rphmeier · 2017-07-25T18:13:07Z

util/rlp/fuzz/fuzz_targets/append_raw.rs

@@ -0,0 +1,19 @@
+#![no_main]
+#[macro_use] extern crate libfuzzer_sys;
+#[cfg(feature = "nightly")]


there is a distinction between #[attr] and #![attr]. One is applied to the following item, and the other is applied to the enclosing item.

there is a distinction between #[attr] and #![attr]. One is applied to the following item, and the other is applied to the enclosing item.

does this mean that #![cfg(feature = "nightly")] at the top of the file will include the entire file as a nightly feature?

I'm having trouble getting features to work nicely with cargo-fuzz. For example, to enable the feature, one must pass --features="nightly" to cargo build, but cargo-build is nested within the cargo fuzz run <test-name> command.

I haven't been able to find documentation on how to enable a feature from the build script, or through cargo-fuzz options. The entire project compiles fine with this feature inclusion, but now the fuzz tests will not run with cargo fuzz run untrusted.

Currently I am trying to require the nighlty feature to build the binary targets with required-feature option under parity/util/rlp/fuzz/Cargo.toml. This stops the binaries from being built, but it is unclear how to enable these features when using a nightly compiler.

figured out enabling features in build scripts:
println!("cargo:rustc-cfg=feature=\"nightly\"")

does this mean that #![cfg(feature = "nightly")] at the top of the file will include the entire file as a nightly feature?

yep

Included `nightly` feature for entire fuzz test files, excluding their compilation during stable builds.

rphmeier · 2017-07-25T19:22:07Z

util/rlp/fuzz/build.rs

+        }
+        Channel::Nightly => {
+            println!("cargo:rustc-cfg=RUSTC_IS_NIGHTLY");
+            println!("cargo:rustc-cfg=feature=\"nightly\"")


what's the purpose of all the RUSTC_IS_*? can't you just if let Channel::Nightly = version_meta().unwrap() { /* set feature flag */ }

you're right, I took this code from an example on using rustc_version (not super familiar with conditional compilation).

I'll remove the code from the other cases, and only set the feature flag for nightly

Some code setting rustc config lines were unnecessary and removed. At the moment, only the `nightly` feature is enabled when running fuzz tests with `cargo fuzz run <test>` and a nightly compiler.

rphmeier · 2017-07-25T20:49:07Z

util/rlp/fuzz/fuzz_targets/append_raw.rs

+    let mut rng = rand::thread_rng();
+    let rand_item_count = rng.gen_range::<usize>(1 as usize, 4096 as usize);
+
+    let _ = rls.append_raw(data, rand_item_count);


append_raw doesn't do any checking of the data passed AFAIK. but it does check that item_count is within the number of remaining items in the list (which in this case, would be 1).

what do you expect to find by fuzz-testing this? RlpStream is typically not exposed to external input, whereas UntrustedRlp is.

trying to get full fuzz-coverage of the classes in the rlp crate, and I hadn't considered where the data may be coming from.

I agree it makes more sense to fully fuzz the UntrustedRlp struct, and am focusing more on that now that the foundational stuff is fixed.

Also still becoming more familiar with the code base, another expectation/goal of writing fuzz tests for all the code.

If for some reason or another, a developer decides to use a normally internal function/struct, it would be beneficial to know if that feature is buggy.

rphmeier · 2017-07-25T20:50:12Z

util/rlp/fuzz/fuzz_targets/untrusted.rs

+use rlp::UntrustedRlp;
+
+fuzz_target!(|data: &[u8]| {
+    let _ = UntrustedRlp::new(data);


creating UntrustedRlp just wraps the data given. probably this fuzz test should try and iterate over all lists recursively. just note that finding invalid RLP will be expected: we want to try and find uncaught errors.

ok, what do you mean by all the lists?

RLP is a data format consisting mostly of nested lists, so you could try iterating over all of them recursively. This might give the fuzzer more to work with in terms of exploring code paths

I added another fuzz test that creates a list (prepending 0xc + data_len to the fuzz data).

probably this fuzz test should try and iterate over all lists recursively.

RLP is a data format consisting mostly of nested lists, so you could try iterating over all of them recursively.

So now I'm thinking of creating lists of various types from the fuzz data, and recursively calling UntrustedRlp::new(fuzz-list-data) on those lists.

What about taking variable length slices from the raw fuzz data, then using those as list items, prepended by their length?

UntrustedRlp has an iter() method yielding an iterator over other UntrustedRlp elements within it. You could write a function

fn iter_recursive(rlp: UntrustedRlp) { for x in rlp.iter() { iter_recursive(x) } }

this would essentially walk the whole RLP structure recursively

this sounds like it requires building a minimally well-formed list in the RLP structure. is this what you are talking about building from the fuzz data?

i don't fully understand your thinking on this. i get the code coverage, just trying to figure out guiding the fuzzer without constraining it too much.

after reading through parity/util/rlp/src/impls.rs, think i get what you were talking about with recursive lists. Make a nested list, where each item is a nested list of each type and subtype en/decoded by the UntrustedRLP structure.

rlp-root |- hash-list -|-hash64 -|-hash128 -|-hash160 -|-etc

Do you think throwing fuzz data into these list items will get the right code/crash coverage (using your recursive function to traverse the RLP structure)?

So the interesting thing about RLP is that actually, each item is either one of a few things:

A list of items, prefixed by its length

Special byte for empty data (treated as empty list)

A list of bytes, prefixed by its length

This is why it's called "recursive length prefix".
Take a look at the spec for more details.
All of our strongly-typed decoding is implemented on top of those primitives.

What I'm suggesting is that we have the fuzzer try and generate data which we treat as a bunch of nested lists of bytes (and try to walk accordingly). This will allow the fuzzer to attempt to explore almost all code paths in the decoder, possibly finding some well-formed RLP that leads to panic. Most of the data it generates will end up being invalid RLP, but it may find some valid RLPs which are not handled correctly.

Isn't this the usual strategy with fuzz testing? e.g. if you have a PNG decoder, you try to have the fuzzer generate data that breaks the decoder, not to make the fuzzer generate a picture of a horse.

What I'm suggesting is that we have the fuzzer try and generate data which we treat as a bunch of nested lists of bytes (and try to walk accordingly)

Currently, I'm using the gen_untrusted_corpus.rs tool to build some basic, well-structured RLP streams. I'm still working on generating all the data types UntrustedRlp can normally handle, and then I'll build some RLP streams with mostly good, but a little messed up data. Hopefully, this will give the fuzzer more to mutate over, and we'll get deeper coverage.

Thanks for the link to the spec, I'll go over it again to get more ideas for what to throw at this thing. I'm understanding better how RLP really works now. Hopefully this leads to some really exhaustive fuzz code.

Isn't this the usual strategy with fuzz testing? e.g. if you have a PNG decoder, you try to have the fuzzer generate data that breaks the decoder, not to make the fuzzer generate a picture of a horse.

Absolutely agree. I'm trying to add more to the fuzzer corpus, so that the fuzzer's mutation algorithms can be more efficient in breaking the RLP decoder. Like sending in a PNG of a horse with a trojan encoded inside :)

For UntrustedRlp to be processed as a list, one needs to prepend the data with the byte `0xc-` where `-` is the number of items in the list. Currently using the length of the fuzz data stream as the number of items in the UntrustedRlp list.

Added an algorithm `iter_recursive()` to iterate recursively over the entire RLP structure. Also created a tool `gen_untrusted_corpus.rs` for generating valid RLP structured data to make the fuzzing process more efficient.

rphmeier · 2017-07-28T09:07:48Z

util/rlp/fuzz/gen_untrusted_corpus.rs

+
+    // Read in base path to fuzzing corpus directory from `RLPCORPUS` environment var
+    let corp = env::var("RLPCORPUS").unwrap();
+    let fp = format!("{}{}", corp, "/untrusted_data/well-formed-list-uint");


should use PathBuf and appending methods for this

Separated writing the RLP stream to its own function, and build the file path using PathBuf.

…zz-tests

debris

Besides few grumbles, this pr lacks integration with our build system. It should be added to our workspace and compiled under nightly feature. Also .gitlab-ci.yml needs to be updated.

debris · 2017-08-14T07:41:10Z

util/rlp/fuzz/build.rs

+        Channel::Beta => {
+        }
+        Channel::Nightly => {
+            println!("cargo:rustc-cfg=feature=\"nightly\"")


if let Channel::Nighlty = version_meta().unwrap().channel { println!("cargo:rustc-cfg=feature=\"nightly\"") }

debris · 2017-08-14T07:43:44Z

util/rlp/fuzz/gen_untrusted_corpus.rs

+
+fn write_corpus(corpdir: &str, rlp: &RlpStream) {
+    // Read in base path to fuzzing corpus directory from `RLPCORPUS` environment var
+    let corp = env::var("RLPCORPUS").unwrap();


what is fuzzing corpus? does it need to be persistent? also:

It will crash on everyone's machine, cause RLPCORPUS is no defined

I see no reason for tests to permanently spoil my filesystem. can you use tempdir?

what is fuzzing corpus? does it need to be persistent?

the fuzzing corpus starts as an initial set of valid input data, and cargo-fuzz mutates the initial data into a larger, potentially crash-causing set of input data. It needs to be somewhat persistent, but you could generate the corpus from scratch each time. The simplest way is to just run rm -rf fuzz/corpus/* after running the fuzz tests.

generating from scratch is computationally more expensive (depends on size of corpus), and some amount of state is lost in code penetration. cargo-fuzz also has a command cmin to minify the corpus.

It will crash on everyone's machine, cause RLPCORPUS is no defined

True, I wasn't sure on the preferred way to do this. I'll change the path to a constant in the test, and use relative path so its all maintained in the fuzz/corpus/<test-name> directory.

I see no reason for tests to permanently spoil my filesystem. can you use tempdir?

cargo-fuzz creates corpus directories and files by default. libFuzzer also accepts a dictionary file which has patterns for creating valid input. Both the corpus and dictionary patterns are useful in getting passed validation checks, and get deeper code coverage.

~~What about adding tempdir as an option, for testers that don't want to have a persistent fuzzing corpus?~~
generate-fuzz-corpus.rs only seeds the default folder cargo-fuzz uses for test corpus files, and this folder is persistent by default. cargo-fuzz cmin and cargo-fuzz tmin will minify the corpus and test directories respectivley. Modifying this behavior would need to be done by either modifying cargo-fuzz or running a bash script to delete corpus files on cargo-fuzz shutdown.

~~Tempdir~~ Some form of corpus-cleanup definitely seems to be the right way to go for automated build tests, since they just need basic fuzzing coverage. Not sure if just adding a line of bash or incorporating into CI is best way to go. cargo-fuzz can also be limited on the number of runs it attempts, which would be good for minimizing the load fuzz testing puts on the build servers.

gavofyork · 2017-09-26T10:55:12Z

@debris / @onicslabs looking stale... any intention to sort this?

onicslabs · 2017-09-29T04:12:59Z

I'm willing to correct any remaining grumbles. The most recent two commits were meant to resolve code cleanup around requiring a nightly build for fuzz tests, and config variables being defined with environment variables.

Apologies for not catching the CI integration task. Wasn't sure if this was something internal or not.

onicslabs · 2017-09-29T04:44:42Z

@debris I'm looking through gitlab-ci.yml, would a new build entry for something like test-fuzz-nightly work for CI integration?

Where the test build switches to nightly, then runs a script that checks for/runs fuzz tests for a configured time or number of iterations.

tomusdrw · 2017-12-21T08:59:36Z

@onicslabs any progress? @debris ping

onicslabs · 2017-12-26T20:23:51Z

@tomusdrw was still waiting on the preferred method of CI testing.

I can add a line that will only run fuzzing for 100 rounds

Just not sure how to run them only when changes are made to the crate they are fuzzing. Any advice?

tomusdrw · 2017-12-27T08:17:36Z

@onicslabs I'm fine with running fuzzing every time during nightly builds, not sure how to achieve that though. @paritytech/ci ?

debris · 2018-01-31T11:20:44Z

It's stale, so I'm closing it

Added cargo-fuzz tests for Rlp crate

1bb6244

Set up some fuzz test harnesses for the UntrustedRlp struct and RlpStream's append_raw function.