Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Added cargo-fuzz tests for Rlp crate #6145

Closed
wants to merge 19 commits into from

Conversation

onicslabs
Copy link

@onicslabs onicslabs commented Jul 25, 2017

Set up some fuzz test harnesses for the UntrustedRlp struct
and RlpStream's append_raw function.


This change is Reviewable

Set up some fuzz test harnesses for the UntrustedRlp struct
and RlpStream's append_raw function.
@parity-cla-bot
Copy link

It looks like @onicslabs hasn'signed our Contributor License Agreement, yet.

The purpose of a CLA is to ensure that the guardian of a project's outputs has the necessary ownership or grants of rights over all contributions to allow them to distribute under the chosen licence.
Wikipedia

You can read and sign our full Contributor License Agreement at the following URL: https://cla.parity.io

Once you've signed, plesae reply to this thread with [clabot:check] to prove it.

Many thanks,

Parity Technologies CLA Bot

[package]
name = "rlp-fuzz"
version = "0.0.1"
authors = ["Automatically generated"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you set this to Parity Technologies <admin@parity.io>?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can do :)

@onicslabs
Copy link
Author

[clabot:check]

@parity-cla-bot
Copy link

It looks like @onicslabs signed our Contributor License Agreement. 👍

Many thanks,

Parity Technologies CLA Bot

@rphmeier rphmeier added A0-pleasereview 🤓 Pull request needs code review. M4-core ⛓ Core client code / Rust. labels Jul 25, 2017
@rphmeier
Copy link
Contributor

A README.md in the rlp-fuzz crate explaining how to run it (unless it's just cargo test -p rlp-fuzz?) would be great too. Do you recommend us to run this as part of the CI, or our nightly tests?

@onicslabs
Copy link
Author

This addition won't pass under a stable build, since cargo-fuzz requires nightly to compile correctly. Should I remove this pull request?

Is it possible to exclude fuzz directories from stable compilation, then check fuzz directories compile correctly under a nightly build?

@onicslabs
Copy link
Author

I'd be happy to do a small readme.md

I'll include a basic cargo-fuzz installation guide, and run through steps for the entire fuzz testing cycle.

@rphmeier
Copy link
Contributor

@onicslabs You could replace the crate with a stub when a cfg(feature = "nightly") is not true. You can set this feature in a build script that checks using the rustc_version crate.

@onicslabs
Copy link
Author

onicslabs commented Jul 25, 2017

Just added conditional compilation for the two fuzzing tests I wrote. @rphmeier are these changes similar to what you were talking about?

Long-term, cargo-fuzz should integrate a feature/option to automatically add conditional compilation.

May start working on this, and see what the cargo-fuzz community thinks about adding a feature like that.

@@ -0,0 +1,18 @@
#![no_main]
#[macro_use] extern crate libfuzzer_sys;
#[cfg(feature = "nightly"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it be #![cfg(feature = "nightly")] to be applied to the whole source file? to me this looks like it will only import the rlp crate on nightly, but still try to do everything else on stable.
same for untrusted.rs

Wrap the entire source files for fuzz tests in the nightly
feature. Meaning tests will only be included in compilation
when using a nightly compiler.
Include entire source in config macro
Re-checked macro syntax, and changed to match correct syntax.
@@ -0,0 +1,19 @@
#![no_main]
#[macro_use] extern crate libfuzzer_sys;
#[cfg(feature = "nightly")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a distinction between #[attr] and #![attr]. One is applied to the following item, and the other is applied to the enclosing item.

Copy link
Author

@onicslabs onicslabs Jul 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a distinction between #[attr] and #![attr]. One is applied to the following item, and the other is applied to the enclosing item.

does this mean that #![cfg(feature = "nightly")] at the top of the file will include the entire file as a nightly feature?

I'm having trouble getting features to work nicely with cargo-fuzz. For example, to enable the feature, one must pass --features="nightly" to cargo build, but cargo-build is nested within the cargo fuzz run <test-name> command.

I haven't been able to find documentation on how to enable a feature from the build script, or through cargo-fuzz options. The entire project compiles fine with this feature inclusion, but now the fuzz tests will not run with cargo fuzz run untrusted.

Currently I am trying to require the nighlty feature to build the binary targets with required-feature option under parity/util/rlp/fuzz/Cargo.toml. This stops the binaries from being built, but it is unclear how to enable these features when using a nightly compiler.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

figured out enabling features in build scripts:
println!("cargo:rustc-cfg=feature=\"nightly\"")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean that #![cfg(feature = "nightly")] at the top of the file will include the entire file as a nightly feature?

yep

Included `nightly` feature for entire fuzz test files, excluding
their compilation during stable builds.
}
Channel::Nightly => {
println!("cargo:rustc-cfg=RUSTC_IS_NIGHTLY");
println!("cargo:rustc-cfg=feature=\"nightly\"")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of all the RUSTC_IS_*? can't you just if let Channel::Nightly = version_meta().unwrap() { /* set feature flag */ }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, I took this code from an example on using rustc_version (not super familiar with conditional compilation).

I'll remove the code from the other cases, and only set the feature flag for nightly

Some code setting rustc config lines were unnecessary and removed.
At the moment, only the `nightly` feature is enabled when running
fuzz tests with `cargo fuzz run <test>` and a nightly compiler.
let mut rng = rand::thread_rng();
let rand_item_count = rng.gen_range::<usize>(1 as usize, 4096 as usize);

let _ = rls.append_raw(data, rand_item_count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

append_raw doesn't do any checking of the data passed AFAIK. but it does check that item_count is within the number of remaining items in the list (which in this case, would be 1).

what do you expect to find by fuzz-testing this? RlpStream is typically not exposed to external input, whereas UntrustedRlp is.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying to get full fuzz-coverage of the classes in the rlp crate, and I hadn't considered where the data may be coming from.

I agree it makes more sense to fully fuzz the UntrustedRlp struct, and am focusing more on that now that the foundational stuff is fixed.

Also still becoming more familiar with the code base, another expectation/goal of writing fuzz tests for all the code.

If for some reason or another, a developer decides to use a normally internal function/struct, it would be beneficial to know if that feature is buggy.

use rlp::UntrustedRlp;

fuzz_target!(|data: &[u8]| {
let _ = UntrustedRlp::new(data);
Copy link
Contributor

@rphmeier rphmeier Jul 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

creating UntrustedRlp just wraps the data given. probably this fuzz test should try and iterate over all lists recursively. just note that finding invalid RLP will be expected: we want to try and find uncaught errors.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, what do you mean by all the lists?

Copy link
Contributor

@rphmeier rphmeier Jul 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RLP is a data format consisting mostly of nested lists, so you could try iterating over all of them recursively. This might give the fuzzer more to work with in terms of exploring code paths

Copy link
Author

@onicslabs onicslabs Jul 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added another fuzz test that creates a list (prepending 0xc + data_len to the fuzz data).

probably this fuzz test should try and iterate over all lists recursively.

RLP is a data format consisting mostly of nested lists, so you could try iterating over all of them recursively.

So now I'm thinking of creating lists of various types from the fuzz data, and recursively calling UntrustedRlp::new(fuzz-list-data) on those lists.

What about taking variable length slices from the raw fuzz data, then using those as list items, prepended by their length?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UntrustedRlp has an iter() method yielding an iterator over other UntrustedRlp elements within it. You could write a function

fn iter_recursive(rlp: UntrustedRlp) {
    for x in rlp.iter() {
        iter_recursive(x)
    }
}

this would essentially walk the whole RLP structure recursively

Copy link
Author

@onicslabs onicslabs Jul 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sounds like it requires building a minimally well-formed list in the RLP structure. is this what you are talking about building from the fuzz data?

i don't fully understand your thinking on this. i get the code coverage, just trying to figure out guiding the fuzzer without constraining it too much.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after reading through parity/util/rlp/src/impls.rs, think i get what you were talking about with recursive lists. Make a nested list, where each item is a nested list of each type and subtype en/decoded by the UntrustedRLP structure.

rlp-root
|- hash-list
 -|-hash64
 -|-hash128
 -|-hash160
 -|-etc

Do you think throwing fuzz data into these list items will get the right code/crash coverage (using your recursive function to traverse the RLP structure)?

Copy link
Contributor

@rphmeier rphmeier Jul 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the interesting thing about RLP is that actually, each item is either one of a few things:

  • A list of items, prefixed by its length
  • Special byte for empty data (treated as empty list)
  • A list of bytes, prefixed by its length

This is why it's called "recursive length prefix".
Take a look at the spec for more details.
All of our strongly-typed decoding is implemented on top of those primitives.

What I'm suggesting is that we have the fuzzer try and generate data which we treat as a bunch of nested lists of bytes (and try to walk accordingly). This will allow the fuzzer to attempt to explore almost all code paths in the decoder, possibly finding some well-formed RLP that leads to panic. Most of the data it generates will end up being invalid RLP, but it may find some valid RLPs which are not handled correctly.

Isn't this the usual strategy with fuzz testing? e.g. if you have a PNG decoder, you try to have the fuzzer generate data that breaks the decoder, not to make the fuzzer generate a picture of a horse.

Copy link
Author

@onicslabs onicslabs Jul 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm suggesting is that we have the fuzzer try and generate data which we treat as a bunch of nested lists of bytes (and try to walk accordingly)

Currently, I'm using the gen_untrusted_corpus.rs tool to build some basic, well-structured RLP streams. I'm still working on generating all the data types UntrustedRlp can normally handle, and then I'll build some RLP streams with mostly good, but a little messed up data. Hopefully, this will give the fuzzer more to mutate over, and we'll get deeper coverage.

Thanks for the link to the spec, I'll go over it again to get more ideas for what to throw at this thing. I'm understanding better how RLP really works now. Hopefully this leads to some really exhaustive fuzz code.

Isn't this the usual strategy with fuzz testing? e.g. if you have a PNG decoder, you try to have the fuzzer generate data that breaks the decoder, not to make the fuzzer generate a picture of a horse.

Absolutely agree. I'm trying to add more to the fuzzer corpus, so that the fuzzer's mutation algorithms can be more efficient in breaking the RLP decoder. Like sending in a PNG of a horse with a trojan encoded inside :)

For UntrustedRlp to be processed as a list, one needs to prepend
the data with the byte `0xc-` where `-` is the number of items
in the list. Currently using the length of the fuzz data stream
as the number of items in the UntrustedRlp list.
Added an algorithm `iter_recursive()` to iterate recursively over
the entire RLP structure. Also created a tool `gen_untrusted_corpus.rs`
for generating valid RLP structured data to make the fuzzing process
more efficient.

// Read in base path to fuzzing corpus directory from `RLPCORPUS` environment var
let corp = env::var("RLPCORPUS").unwrap();
let fp = format!("{}{}", corp, "/untrusted_data/well-formed-list-uint");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should use PathBuf and appending methods for this

Separated writing the RLP stream to its own function, and build
the file path using PathBuf.
Copy link
Collaborator

@debris debris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides few grumbles, this pr lacks integration with our build system. It should be added to our workspace and compiled under nightly feature. Also .gitlab-ci.yml needs to be updated.

Channel::Beta => {
}
Channel::Nightly => {
println!("cargo:rustc-cfg=feature=\"nightly\"")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if let Channel::Nighlty = version_meta().unwrap().channel {
  println!("cargo:rustc-cfg=feature=\"nightly\"")
}


fn write_corpus(corpdir: &str, rlp: &RlpStream) {
// Read in base path to fuzzing corpus directory from `RLPCORPUS` environment var
let corp = env::var("RLPCORPUS").unwrap();
Copy link
Collaborator

@debris debris Aug 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is fuzzing corpus? does it need to be persistent? also:

  1. It will crash on everyone's machine, cause RLPCORPUS is no defined
  2. I see no reason for tests to permanently spoil my filesystem. can you use tempdir?

Copy link
Author

@onicslabs onicslabs Aug 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is fuzzing corpus? does it need to be persistent?

the fuzzing corpus starts as an initial set of valid input data, and cargo-fuzz mutates the initial data into a larger, potentially crash-causing set of input data. It needs to be somewhat persistent, but you could generate the corpus from scratch each time. The simplest way is to just run rm -rf fuzz/corpus/* after running the fuzz tests.

generating from scratch is computationally more expensive (depends on size of corpus), and some amount of state is lost in code penetration. cargo-fuzz also has a command cmin to minify the corpus.

It will crash on everyone's machine, cause RLPCORPUS is no defined

True, I wasn't sure on the preferred way to do this. I'll change the path to a constant in the test, and use relative path so its all maintained in the fuzz/corpus/<test-name> directory.

I see no reason for tests to permanently spoil my filesystem. can you use tempdir?

cargo-fuzz creates corpus directories and files by default. libFuzzer also accepts a dictionary file which has patterns for creating valid input. Both the corpus and dictionary patterns are useful in getting passed validation checks, and get deeper code coverage.

What about adding tempdir as an option, for testers that don't want to have a persistent fuzzing corpus?
generate-fuzz-corpus.rs only seeds the default folder cargo-fuzz uses for test corpus files, and this folder is persistent by default. cargo-fuzz cmin and cargo-fuzz tmin will minify the corpus and test directories respectivley. Modifying this behavior would need to be done by either modifying cargo-fuzz or running a bash script to delete corpus files on cargo-fuzz shutdown.

Tempdir Some form of corpus-cleanup definitely seems to be the right way to go for automated build tests, since they just need basic fuzzing coverage. Not sure if just adding a line of bash or incorporating into CI is best way to go. cargo-fuzz can also be limited on the number of runs it attempts, which would be good for minimizing the load fuzz testing puts on the build servers.

@debris debris added A5-grumble 🔥 Pull request has minor issues that must be addressed before merging. and removed A0-pleasereview 🤓 Pull request needs code review. labels Aug 14, 2017
@gavofyork
Copy link
Contributor

@debris / @onicslabs looking stale... any intention to sort this?

@gavofyork gavofyork added A3-inprogress ⏳ Pull request is in progress. No review needed at this stage. and removed A5-grumble 🔥 Pull request has minor issues that must be addressed before merging. labels Sep 26, 2017
@onicslabs
Copy link
Author

onicslabs commented Sep 29, 2017

I'm willing to correct any remaining grumbles. The most recent two commits were meant to resolve code cleanup around requiring a nightly build for fuzz tests, and config variables being defined with environment variables.

Apologies for not catching the CI integration task. Wasn't sure if this was something internal or not.

@onicslabs
Copy link
Author

@debris I'm looking through gitlab-ci.yml, would a new build entry for something like test-fuzz-nightly work for CI integration?

Where the test build switches to nightly, then runs a script that checks for/runs fuzz tests for a configured time or number of iterations.

@5chdn 5chdn added this to the 1.9 milestone Oct 11, 2017
@5chdn 5chdn added A1-onice 🌨 Pull request is reviewed well, but should not yet be merged. and removed A1-onice 🌨 Pull request is reviewed well, but should not yet be merged. labels Oct 11, 2017
@tomusdrw
Copy link
Collaborator

tomusdrw commented Dec 21, 2017

@onicslabs any progress? @debris ping

@tomusdrw tomusdrw added A3-stale 🍃 Pull request did not receive any updates in a long time. No review needed at this stage. Close it. and removed A3-inprogress ⏳ Pull request is in progress. No review needed at this stage. labels Dec 21, 2017
@onicslabs
Copy link
Author

onicslabs commented Dec 26, 2017

@tomusdrw was still waiting on the preferred method of CI testing.

I can add a line that will only run fuzzing for 100 rounds

Just not sure how to run them only when changes are made to the crate they are fuzzing. Any advice?

@tomusdrw
Copy link
Collaborator

@onicslabs I'm fine with running fuzzing every time during nightly builds, not sure how to achieve that though. @paritytech/ci ?

@5chdn 5chdn added the A1-onice 🌨 Pull request is reviewed well, but should not yet be merged. label Dec 30, 2017
@5chdn 5chdn modified the milestones: 1.9, 1.10 Dec 30, 2017
@5chdn 5chdn removed the A1-onice 🌨 Pull request is reviewed well, but should not yet be merged. label Jan 9, 2018
@5chdn 5chdn modified the milestones: 1.10, 1.11 Jan 23, 2018
@debris
Copy link
Collaborator

debris commented Jan 31, 2018

It's stale, so I'm closing it

@debris debris closed this Jan 31, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A3-stale 🍃 Pull request did not receive any updates in a long time. No review needed at this stage. Close it. M4-core ⛓ Core client code / Rust.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants