Skip to content

Commit

Permalink
Avoid returning errors where possible
Browse files Browse the repository at this point in the history
We were previously unsure whether it was better to return an error and exit the
fuzz target early, or to append "fake" data (like all zeros) when an `Arbitrary`
implementation requested more data than the fuzzer gave us. We originally chose
to exit early, and this commit is the first step in reversing that decision.

We didn't have any data to motivate either choice when we originally made the
decision to exit early. What we had was an intuition that we shouldn't
repeatedly waste time on test cases that are very similar to each other because
they are all padded with the same fake data at the end. If we exited early, our
thinking went, we would yield our time back to the fuzzer, letting it more
efficiently explore the state space.

In practice, it has worked out okay, but not great. What we've been missing out
on are chance mutations that cause us to explore new code paths. A random
mutation made by the fuzzer that would otherwise take us to a new code path
happens to have an invalid UTF-8 code point or not quite enough data, and then
we return an error and exit early. If we avoided returning errors, then that
random mutation would have led us to new code paths. A corollary is that test
case reduction is more efficient as well, since the reduced input bytes are also
more likely to succeed in generating a valid instance of the `Arbitrary` type.

In summary, exiting early yields time back to the fuzzer, giving it more chances
to try new mutations, while avoiding early exits is more forgiving to random
mutations, making them more likely to succeed at finding new code paths. This is
a trade off, and we can't have both in the limit. My local, not-super-rigorous
experiments are telling me that we made the wrong trade off originally, and that
being forgiving with mutations to find new code paths easier is the better
choice.

Next breaking release that we make, I think we can remove the fallibility from
the `Arbitrary` trait and all the `Unstructured` methods. The only tricky one is
`Unstructured::get_bytes`, but I think we can either loosen its contract so that
it doesn't always return a slice of length `size`, or remove the method
completely in favor of `Unstructured::fill_buffer`.
  • Loading branch information
fitzgen committed Aug 21, 2020
1 parent 10d68cf commit 1d2f653
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 10 deletions.
23 changes: 18 additions & 5 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1013,10 +1013,22 @@ where
impl Arbitrary for String {
fn arbitrary(u: &mut Unstructured<'_>) -> Result<Self> {
let size = u.arbitrary_len::<u8>()?;
let bytes = u.get_bytes(size)?;
str::from_utf8(bytes)
.map_err(|_| Error::IncorrectFormat)
.map(Into::into)
match str::from_utf8(&u.data[..size]) {
Ok(s) => {
u.data = &u.data[size..];
Ok(s.into())
}
Err(e) => {
let i = e.valid_up_to();
let (valid, rest) = u.data.split_at(i);
let s = unsafe {
debug_assert!(str::from_utf8(valid).is_ok());
str::from_utf8_unchecked(valid)
};
u.data = rest;
Ok(s.into())
}
}
}

fn arbitrary_take_rest(u: Unstructured<'_>) -> Result<Self> {
Expand Down Expand Up @@ -1300,7 +1312,8 @@ mod test {
assert_eq!(z, [1, 2]);
rb.fill_buffer(&mut z).unwrap();
assert_eq!(z, [3, 4]);
assert!(rb.fill_buffer(&mut z).is_err());
rb.fill_buffer(&mut z).unwrap();
assert_eq!(z, [0, 0]);
}

#[test]
Expand Down
15 changes: 10 additions & 5 deletions src/unstructured.rs
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ use std::{mem, ops};
/// # }
/// ```
pub struct Unstructured<'a> {
data: &'a [u8],
pub(crate) data: &'a [u8],
}

impl<'a> Unstructured<'a> {
Expand Down Expand Up @@ -222,7 +222,7 @@ impl<'a> Unstructured<'a> {

fn arbitrary_byte_size(&mut self) -> Result<usize> {
if self.data.len() == 0 {
Err(Error::NotEnoughData)
Ok(0)
} else if self.data.len() == 1 {
self.data = &[];
Ok(0)
Expand Down Expand Up @@ -385,11 +385,16 @@ impl<'a> Unstructured<'a> {
/// let mut buf = [0; 2];
/// assert!(u.fill_buffer(&mut buf).is_ok());
/// assert!(u.fill_buffer(&mut buf).is_ok());
/// assert!(u.fill_buffer(&mut buf).is_err());
/// ```
pub fn fill_buffer(&mut self, buffer: &mut [u8]) -> Result<()> {
let bytes = self.get_bytes(buffer.len())?;
buffer.copy_from_slice(bytes);
let n = std::cmp::min(buffer.len(), self.data.len());
for i in 0..n {
buffer[i] = self.data[i];
}
for i in self.data.len()..buffer.len() {
buffer[i] = 0;
}
self.data = &self.data[n..];
Ok(())
}

Expand Down

0 comments on commit 1d2f653

Please sign in to comment.