Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix incremental reads #64

Closed

Conversation

michaelkirk
Copy link

Hello!

The included test hangs due to an infinite loop without the fix in the second commit. This captures an actual use case I have, see Context below for more info.

The contract of drain_to's write_bytes parameter states:

/// Semantics of write_bytes:
/// Should dump as many of the provided bytes as possible to whatever sink until no bytes are left or an error is encountered
/// Return how many bytes have actually been dumped to the sink.

So I think that implies that we might not be able to write all the bytes (e.g. if the output buffer is full).

Context

In case it's helpful, here's my use case: https://github.com/michaelkirk/geomedea

I have some data structures serialized to a file which is then zstd compressed.

I'm incrementally parsing these data structures back from the file - sipping off of the zstd decompression stream as I go.

I'm currently using the zstd crate, via async_compression which is working great, but I'm interested in this pure rust solution, ultimately hoping to use it across targets that include the browser via wasm.


I'm very new here, and not very familiar with the internals of zstd, so I'd appreciate a critical review.

@michaelkirk
Copy link
Author

michaelkirk commented Jul 10, 2024

I did an audit of the other consumers of drain_to in case we need similar changes.

I think we should probably also change this one:

    pub fn drain_to_window_size_writer(&mut self, mut sink: impl Write) -> Result<usize, Error> {
        match self.can_drain_to_window_size() {
            None => Ok(0),
            Some(can_drain) => {
-               self.drain_to(can_drain, |buf| write_all_bytes(&mut sink, buf))?;
+               self.drain_to(can_drain, |buf| write_all_bytes(&mut sink, buf))
-               Ok(can_drain)
            }
        }
    }

... but I haven't hit that case yet, and so I'm not sure how to test it.

I think the Read implementation, which uses drain_to, is already OK because we pre-compute amount in a way consistent with the internals of drain_to

impl Read for DecodeBuffer {
    fn read(&mut self, target: &mut [u8]) -> Result<usize, Error> {
        let max_amount = self.can_drain_to_window_size().unwrap_or(0);
+       // this calculation mirrors what's in drain_to
        let amount = max_amount.min(target.len());

        let mut written = 0;
-       self.drain_to(amount, |buf| {
+       let amount_drained = self.drain_to(amount, |buf| {
            target[written..][..buf.len()].copy_from_slice(buf);
            written += buf.len();
            (buf.len(), Ok(()))
        })?;
+       // maybe we should add an assert though?
+       debug_assert_eq!(amount_drained, amount);

        Ok(amount)
    }
}

Same goes for this read_all implementation:

   pub fn read_all(&mut self, target: &mut [u8]) -> Result<usize, Error> {
+       // this calculation mirrors what's in drain_to
        let amount = self.buffer.len().min(target.len());

        let mut written = 0;
        self.drain_to(amount, |buf| {
            target[written..][..buf.len()].copy_from_slice(buf);
            written += buf.len();
            (buf.len(), Ok(()))
        })?;
        Ok(amount)
    }

@KillingSpark
Copy link
Owner

Hi! Thanks for the detailed report and the test :)

I'll definitely have a look later

@@ -341,6 +339,7 @@ fn write_all_bytes(mut sink: impl Write, buf: &[u8]) -> (usize, Result<(), Error
let mut written = 0;
while written < buf.len() {
match sink.write(&buf[written..]) {
Ok(0) => return (written, Ok(())),
Copy link
Author

@michaelkirk michaelkirk Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the crux of the fix.

In my case I'm reading a single u64, so sink can only hold 8 bytes. It's quickly filled, and then subsequent rounds of the while loop will write 0 bytes.

Previously that would lead to an infinite loop.

I'd expect a similar problem for anyone using collect_to_writer with a writer with capacity smaller thanbuf.

@KillingSpark
Copy link
Owner

That's definitiv a bug worth fixing. Is the change to the API of the read_all function necessary? I'd like to fix this without a breaking API change so current users can get the bugfix without updating their dependency version.

If the API change is important I'm open to it too but I'd separate the two changes for the above reason

@michaelkirk
Copy link
Author

Is the change to the API of the read_all function necessary?

There are no changes to read_all. I think maybe you're talking about the changed to fn drain_to. The diff context is confusing:

Screenshot 2024-07-12 at 10 14 13

Try expanding the diff to clarify.

@KillingSpark
Copy link
Owner

Umpf I'm sorry, I hate the collapsed diffs, I fall for them a lot.

Then this LGTM, if you could fix the test for no_std (or just disable it in the no_std env, that would be fine by me) I'd merge this :)

@KillingSpark
Copy link
Owner

I went ahead and fixed the test and credited you in the Changelog.md

Thanks again for raising this issue to my attention. I'll release a new version soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants