-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Conversation
Set up some fuzz test harnesses for the UntrustedRlp struct and RlpStream's append_raw function.
It looks like @onicslabs hasn'signed our Contributor License Agreement, yet.
You can read and sign our full Contributor License Agreement at the following URL: https://cla.parity.io Once you've signed, plesae reply to this thread with Many thanks, Parity Technologies CLA Bot |
util/rlp/fuzz/Cargo.toml
Outdated
[package] | ||
name = "rlp-fuzz" | ||
version = "0.0.1" | ||
authors = ["Automatically generated"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you set this to Parity Technologies <admin@parity.io>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can do :)
[clabot:check] |
It looks like @onicslabs signed our Contributor License Agreement. 👍 Many thanks, Parity Technologies CLA Bot |
A |
This addition won't pass under a stable build, since cargo-fuzz requires nightly to compile correctly. Should I remove this pull request? Is it possible to exclude fuzz directories from stable compilation, then check fuzz directories compile correctly under a nightly build? |
I'd be happy to do a small readme.md I'll include a basic cargo-fuzz installation guide, and run through steps for the entire fuzz testing cycle. |
@onicslabs You could replace the crate with a stub when a |
Only attempt to compile fuzz tests using a nightly compiler
Just added conditional compilation for the two fuzzing tests I wrote. @rphmeier are these changes similar to what you were talking about? Long-term, cargo-fuzz should integrate a feature/option to automatically add conditional compilation. May start working on this, and see what the cargo-fuzz community thinks about adding a feature like that. |
@@ -0,0 +1,18 @@ | |||
#![no_main] | |||
#[macro_use] extern crate libfuzzer_sys; | |||
#[cfg(feature = "nightly"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't it be #![cfg(feature = "nightly")]
to be applied to the whole source file? to me this looks like it will only import the rlp
crate on nightly, but still try to do everything else on stable.
same for untrusted.rs
Wrap the entire source files for fuzz tests in the nightly feature. Meaning tests will only be included in compilation when using a nightly compiler.
Include entire source in config macro
Re-checked macro syntax, and changed to match correct syntax.
@@ -0,0 +1,19 @@ | |||
#![no_main] | |||
#[macro_use] extern crate libfuzzer_sys; | |||
#[cfg(feature = "nightly")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a distinction between #[attr]
and #![attr]
. One is applied to the following item, and the other is applied to the enclosing item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a distinction between #[attr] and #![attr]. One is applied to the following item, and the other is applied to the enclosing item.
does this mean that #![cfg(feature = "nightly")]
at the top of the file will include the entire file as a nightly feature?
I'm having trouble getting features to work nicely with cargo-fuzz. For example, to enable the feature, one must pass --features="nightly"
to cargo build
, but cargo-build is nested within the cargo fuzz run <test-name>
command.
I haven't been able to find documentation on how to enable a feature from the build script, or through cargo-fuzz options. The entire project compiles fine with this feature inclusion, but now the fuzz tests will not run with cargo fuzz run untrusted
.
Currently I am trying to require the nighlty feature to build the binary targets with required-feature
option under parity/util/rlp/fuzz/Cargo.toml
. This stops the binaries from being built, but it is unclear how to enable these features when using a nightly compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
figured out enabling features in build scripts:
println!("cargo:rustc-cfg=feature=\"nightly\"")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean that #![cfg(feature = "nightly")] at the top of the file will include the entire file as a nightly feature?
yep
Included `nightly` feature for entire fuzz test files, excluding their compilation during stable builds.
util/rlp/fuzz/build.rs
Outdated
} | ||
Channel::Nightly => { | ||
println!("cargo:rustc-cfg=RUSTC_IS_NIGHTLY"); | ||
println!("cargo:rustc-cfg=feature=\"nightly\"") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the purpose of all the RUSTC_IS_*
? can't you just if let Channel::Nightly = version_meta().unwrap() { /* set feature flag */ }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, I took this code from an example on using rustc_version (not super familiar with conditional compilation).
I'll remove the code from the other cases, and only set the feature flag for nightly
Some code setting rustc config lines were unnecessary and removed. At the moment, only the `nightly` feature is enabled when running fuzz tests with `cargo fuzz run <test>` and a nightly compiler.
let mut rng = rand::thread_rng(); | ||
let rand_item_count = rng.gen_range::<usize>(1 as usize, 4096 as usize); | ||
|
||
let _ = rls.append_raw(data, rand_item_count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
append_raw
doesn't do any checking of the data passed AFAIK. but it does check that item_count
is within the number of remaining items in the list (which in this case, would be 1).
what do you expect to find by fuzz-testing this? RlpStream
is typically not exposed to external input, whereas UntrustedRlp
is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trying to get full fuzz-coverage of the classes in the rlp crate, and I hadn't considered where the data may be coming from.
I agree it makes more sense to fully fuzz the UntrustedRlp struct, and am focusing more on that now that the foundational stuff is fixed.
Also still becoming more familiar with the code base, another expectation/goal of writing fuzz tests for all the code.
If for some reason or another, a developer decides to use a normally internal function/struct, it would be beneficial to know if that feature is buggy.
use rlp::UntrustedRlp; | ||
|
||
fuzz_target!(|data: &[u8]| { | ||
let _ = UntrustedRlp::new(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
creating UntrustedRlp
just wraps the data given. probably this fuzz test should try and iterate over all lists recursively. just note that finding invalid RLP will be expected: we want to try and find uncaught errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, what do you mean by all the lists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RLP is a data format consisting mostly of nested lists, so you could try iterating over all of them recursively. This might give the fuzzer more to work with in terms of exploring code paths
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added another fuzz test that creates a list (prepending 0xc + data_len
to the fuzz data).
probably this fuzz test should try and iterate over all lists recursively.
RLP is a data format consisting mostly of nested lists, so you could try iterating over all of them recursively.
So now I'm thinking of creating lists of various types from the fuzz data, and recursively calling UntrustedRlp::new(fuzz-list-data)
on those lists.
What about taking variable length slices from the raw fuzz data, then using those as list items, prepended by their length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UntrustedRlp
has an iter()
method yielding an iterator over other UntrustedRlp
elements within it. You could write a function
fn iter_recursive(rlp: UntrustedRlp) {
for x in rlp.iter() {
iter_recursive(x)
}
}
this would essentially walk the whole RLP structure recursively
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sounds like it requires building a minimally well-formed list in the RLP structure. is this what you are talking about building from the fuzz data?
i don't fully understand your thinking on this. i get the code coverage, just trying to figure out guiding the fuzzer without constraining it too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after reading through parity/util/rlp/src/impls.rs
, think i get what you were talking about with recursive lists. Make a nested list, where each item is a nested list of each type and subtype en/decoded by the UntrustedRLP structure.
rlp-root
|- hash-list
-|-hash64
-|-hash128
-|-hash160
-|-etc
Do you think throwing fuzz data into these list items will get the right code/crash coverage (using your recursive function to traverse the RLP structure)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the interesting thing about RLP is that actually, each item is either one of a few things:
- A list of items, prefixed by its length
- Special byte for empty data (treated as empty list)
- A list of bytes, prefixed by its length
This is why it's called "recursive length prefix".
Take a look at the spec for more details.
All of our strongly-typed decoding is implemented on top of those primitives.
What I'm suggesting is that we have the fuzzer try and generate data which we treat as a bunch of nested lists of bytes (and try to walk accordingly). This will allow the fuzzer to attempt to explore almost all code paths in the decoder, possibly finding some well-formed RLP that leads to panic. Most of the data it generates will end up being invalid RLP, but it may find some valid RLPs which are not handled correctly.
Isn't this the usual strategy with fuzz testing? e.g. if you have a PNG decoder, you try to have the fuzzer generate data that breaks the decoder, not to make the fuzzer generate a picture of a horse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm suggesting is that we have the fuzzer try and generate data which we treat as a bunch of nested lists of bytes (and try to walk accordingly)
Currently, I'm using the gen_untrusted_corpus.rs
tool to build some basic, well-structured RLP streams. I'm still working on generating all the data types UntrustedRlp
can normally handle, and then I'll build some RLP streams with mostly good, but a little messed up data. Hopefully, this will give the fuzzer more to mutate over, and we'll get deeper coverage.
Thanks for the link to the spec, I'll go over it again to get more ideas for what to throw at this thing. I'm understanding better how RLP really works now. Hopefully this leads to some really exhaustive fuzz code.
Isn't this the usual strategy with fuzz testing? e.g. if you have a PNG decoder, you try to have the fuzzer generate data that breaks the decoder, not to make the fuzzer generate a picture of a horse.
Absolutely agree. I'm trying to add more to the fuzzer corpus, so that the fuzzer's mutation algorithms can be more efficient in breaking the RLP decoder. Like sending in a PNG of a horse with a trojan encoded inside :)
For UntrustedRlp to be processed as a list, one needs to prepend the data with the byte `0xc-` where `-` is the number of items in the list. Currently using the length of the fuzz data stream as the number of items in the UntrustedRlp list.
Added an algorithm `iter_recursive()` to iterate recursively over the entire RLP structure. Also created a tool `gen_untrusted_corpus.rs` for generating valid RLP structured data to make the fuzzing process more efficient.
|
||
// Read in base path to fuzzing corpus directory from `RLPCORPUS` environment var | ||
let corp = env::var("RLPCORPUS").unwrap(); | ||
let fp = format!("{}{}", corp, "/untrusted_data/well-formed-list-uint"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should use PathBuf
and appending methods for this
Separated writing the RLP stream to its own function, and build the file path using PathBuf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides few grumbles, this pr lacks integration with our build system. It should be added to our workspace and compiled under nightly
feature. Also .gitlab-ci.yml
needs to be updated.
util/rlp/fuzz/build.rs
Outdated
Channel::Beta => { | ||
} | ||
Channel::Nightly => { | ||
println!("cargo:rustc-cfg=feature=\"nightly\"") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if let Channel::Nighlty = version_meta().unwrap().channel {
println!("cargo:rustc-cfg=feature=\"nightly\"")
}
|
||
fn write_corpus(corpdir: &str, rlp: &RlpStream) { | ||
// Read in base path to fuzzing corpus directory from `RLPCORPUS` environment var | ||
let corp = env::var("RLPCORPUS").unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is fuzzing corpus? does it need to be persistent? also:
- It will crash on everyone's machine, cause
RLPCORPUS
is no defined - I see no reason for tests to permanently spoil my filesystem. can you use tempdir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is fuzzing corpus? does it need to be persistent?
the fuzzing corpus starts as an initial set of valid input data, and cargo-fuzz mutates the initial data into a larger, potentially crash-causing set of input data. It needs to be somewhat persistent, but you could generate the corpus from scratch each time. The simplest way is to just run rm -rf fuzz/corpus/*
after running the fuzz tests.
generating from scratch is computationally more expensive (depends on size of corpus), and some amount of state is lost in code penetration. cargo-fuzz also has a command cmin
to minify the corpus.
It will crash on everyone's machine, cause RLPCORPUS is no defined
True, I wasn't sure on the preferred way to do this. I'll change the path to a constant in the test, and use relative path so its all maintained in the fuzz/corpus/<test-name>
directory.
I see no reason for tests to permanently spoil my filesystem. can you use tempdir?
cargo-fuzz
creates corpus directories and files by default. libFuzzer
also accepts a dictionary file which has patterns for creating valid input. Both the corpus and dictionary patterns are useful in getting passed validation checks, and get deeper code coverage.
What about adding tempdir as an option, for testers that don't want to have a persistent fuzzing corpus?
generate-fuzz-corpus.rs
only seeds the default folder cargo-fuzz
uses for test corpus files, and this folder is persistent by default. cargo-fuzz cmin
and cargo-fuzz tmin
will minify the corpus and test directories respectivley. Modifying this behavior would need to be done by either modifying cargo-fuzz
or running a bash script to delete corpus files on cargo-fuzz
shutdown.
Tempdir Some form of corpus-cleanup definitely seems to be the right way to go for automated build tests, since they just need basic fuzzing coverage. Not sure if just adding a line of bash or incorporating into CI is best way to go. cargo-fuzz
can also be limited on the number of runs it attempts, which would be good for minimizing the load fuzz testing puts on the build servers.
@debris / @onicslabs looking stale... any intention to sort this? |
I'm willing to correct any remaining grumbles. The most recent two commits were meant to resolve code cleanup around requiring a nightly build for fuzz tests, and config variables being defined with environment variables. Apologies for not catching the CI integration task. Wasn't sure if this was something internal or not. |
@debris I'm looking through Where the test build switches to nightly, then runs a script that checks for/runs fuzz tests for a configured time or number of iterations. |
@onicslabs any progress? @debris ping |
@tomusdrw was still waiting on the preferred method of CI testing. I can add a line that will only run fuzzing for 100 rounds Just not sure how to run them only when changes are made to the crate they are fuzzing. Any advice? |
@onicslabs I'm fine with running fuzzing every time during nightly builds, not sure how to achieve that though. @paritytech/ci ? |
It's stale, so I'm closing it |
Set up some fuzz test harnesses for the UntrustedRlp struct
and RlpStream's append_raw function.
This change is![Reviewable](https://github.com/camo/23b05f5fb48215c989e92cc44cf6512512d083132bd3daf689867c8d9d386888/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)