fix incremental reads #64

michaelkirk · 2024-07-10T23:20:10Z

Hello!

The included test hangs due to an infinite loop without the fix in the second commit. This captures an actual use case I have, see Context below for more info.

The contract of drain_to's write_bytes parameter states:

/// Semantics of write_bytes:
/// Should dump as many of the provided bytes as possible to whatever sink until no bytes are left or an error is encountered
/// Return how many bytes have actually been dumped to the sink.

So I think that implies that we might not be able to write all the bytes (e.g. if the output buffer is full).

Context

In case it's helpful, here's my use case: https://github.com/michaelkirk/geomedea

I have some data structures serialized to a file which is then zstd compressed.

I'm incrementally parsing these data structures back from the file - sipping off of the zstd decompression stream as I go.

I'm currently using the zstd crate, via async_compression which is working great, but I'm interested in this pure rust solution, ultimately hoping to use it across targets that include the browser via wasm.

I'm very new here, and not very familiar with the internals of zstd, so I'd appreciate a critical review.

Previously we assumed a complete read.

michaelkirk · 2024-07-10T23:32:33Z

I did an audit of the other consumers of drain_to in case we need similar changes.

I think we should probably also change this one:

    pub fn drain_to_window_size_writer(&mut self, mut sink: impl Write) -> Result<usize, Error> {
        match self.can_drain_to_window_size() {
            None => Ok(0),
            Some(can_drain) => {
-               self.drain_to(can_drain, |buf| write_all_bytes(&mut sink, buf))?;
+               self.drain_to(can_drain, |buf| write_all_bytes(&mut sink, buf))
-               Ok(can_drain)
            }
        }
    }

... but I haven't hit that case yet, and so I'm not sure how to test it.

I think the Read implementation, which uses drain_to, is already OK because we pre-compute amount in a way consistent with the internals of drain_to

impl Read for DecodeBuffer {
    fn read(&mut self, target: &mut [u8]) -> Result<usize, Error> {
        let max_amount = self.can_drain_to_window_size().unwrap_or(0);
+       // this calculation mirrors what's in drain_to
        let amount = max_amount.min(target.len());

        let mut written = 0;
-       self.drain_to(amount, |buf| {
+       let amount_drained = self.drain_to(amount, |buf| {
            target[written..][..buf.len()].copy_from_slice(buf);
            written += buf.len();
            (buf.len(), Ok(()))
        })?;
+       // maybe we should add an assert though?
+       debug_assert_eq!(amount_drained, amount);

        Ok(amount)
    }
}

Same goes for this read_all implementation:

   pub fn read_all(&mut self, target: &mut [u8]) -> Result<usize, Error> {
+       // this calculation mirrors what's in drain_to
        let amount = self.buffer.len().min(target.len());

        let mut written = 0;
        self.drain_to(amount, |buf| {
            target[written..][..buf.len()].copy_from_slice(buf);
            written += buf.len();
            (buf.len(), Ok(()))
        })?;
        Ok(amount)
    }

KillingSpark · 2024-07-11T13:21:54Z

Hi! Thanks for the detailed report and the test :)

I'll definitely have a look later

michaelkirk · 2024-07-11T17:09:51Z

src/decoding/decodebuffer.rs

@@ -341,6 +339,7 @@ fn write_all_bytes(mut sink: impl Write, buf: &[u8]) -> (usize, Result<(), Error
    let mut written = 0;
    while written < buf.len() {
        match sink.write(&buf[written..]) {
+            Ok(0) => return (written, Ok(())),


Here is the crux of the fix.

In my case I'm reading a single u64, so sink can only hold 8 bytes. It's quickly filled, and then subsequent rounds of the while loop will write 0 bytes.

Previously that would lead to an infinite loop.

I'd expect a similar problem for anyone using collect_to_writer with a writer with capacity smaller thanbuf.

KillingSpark · 2024-07-12T12:12:08Z

That's definitiv a bug worth fixing. Is the change to the API of the read_all function necessary? I'd like to fix this without a breaking API change so current users can get the bugfix without updating their dependency version.

If the API change is important I'm open to it too but I'd separate the two changes for the above reason

michaelkirk · 2024-07-12T17:16:04Z

Is the change to the API of the read_all function necessary?

There are no changes to read_all. I think maybe you're talking about the changed to fn drain_to. The diff context is confusing:

Try expanding the diff to clarify.

KillingSpark · 2024-07-12T18:03:21Z

Umpf I'm sorry, I hate the collapsed diffs, I fall for them a lot.

Then this LGTM, if you could fix the test for no_std (or just disable it in the no_std env, that would be fine by me) I'd merge this :)

KillingSpark · 2024-07-24T16:25:08Z

I went ahead and fixed the test and credited you in the Changelog.md

Thanks again for raising this issue to my attention. I'll release a new version soon.

michaelkirk added 2 commits July 10, 2024 16:10

red test for partial reads

4f695d6

support partial reads

5a2df46

Previously we assumed a complete read.

fixup! support partial reads

75a86fd

michaelkirk commented Jul 11, 2024

View reviewed changes

KillingSpark closed this Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix incremental reads #64

fix incremental reads #64

michaelkirk commented Jul 10, 2024

michaelkirk commented Jul 10, 2024 •

edited

Loading

KillingSpark commented Jul 11, 2024

michaelkirk Jul 11, 2024 •

edited

Loading

KillingSpark commented Jul 12, 2024

michaelkirk commented Jul 12, 2024

KillingSpark commented Jul 12, 2024

KillingSpark commented Jul 24, 2024

fix incremental reads #64

fix incremental reads #64

Conversation

michaelkirk commented Jul 10, 2024

Context

michaelkirk commented Jul 10, 2024 • edited Loading

KillingSpark commented Jul 11, 2024

michaelkirk Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

KillingSpark commented Jul 12, 2024

michaelkirk commented Jul 12, 2024

KillingSpark commented Jul 12, 2024

KillingSpark commented Jul 24, 2024

michaelkirk commented Jul 10, 2024 •

edited

Loading

michaelkirk Jul 11, 2024 •

edited

Loading