Consider adding lifetime to byte buffer reference passed into load_pdf_from_bytes #63

Lampyrida · 2022-12-10T22:12:37Z

Hi there!

I spent some time tracking down a bug that I tracked down to the following issue:

The load_pdf_from_bytes function wraps FPDF_LoadMemDocument64. FPDF_LoadMemDocument64 requires the byte buffer passed into it to remain valid for as long as the document is kept open. However, this is not enforced in any way by the load_pdf_from_bytes function. I'm wondering if this is something we can address by introducing a lifetime associated with the buffer that ties it to the document.

The text was updated successfully, but these errors were encountered:

ajrcarey · 2022-12-11T13:51:03Z

Hi @Lampyrida,

Good to hear from you and thank you for reporting this bug. I'm very sorry about this - this is entirely an oversight on my part.

In addition to the lifetime mismatch, I'm concerned about what might happen if the data in the byte buffer was mutated "behind Pdfium's back", so to speak. (This would require unsafe code, but it's not impossible.) The safest way of guaranteeing both lifetime validity and data immutability is for PdfDocument to take ownership of the byte buffer itself, rather than taking a reference. This means the Pdfium::load_pdf_from_bytes() function signature would change to:

pub fn load_pdf_from_bytes(&self, bytes: Vec<u8>, password: Option<&str>)

This would be less convenient when working with byte buffers loaded via the include!() macro, however.

Let me think about this over the next day or two. I'd welcome any further thoughts you have.

Lampyrida · 2022-12-11T21:56:50Z

That's all good! I appreciate all the effort you have put into this library in the first place.

You bring up a great point regarding the risk of mutations also. Having the PdfDocument take ownership of the the byte buffer would certainly be ideal for my current use case.

I'm still somewhat new to Rust so not 100% sure this would work, but perhaps we can support both ownership and references by having PdfDocument store a Cow<'a, [u8]>? That could then be combined with two Pdfium::load_pdf_from_bytes() variants: one that takes ownership of the byte buffer, and internally creates a Cow::Owned, and another that takes a reference with appropriate lifetime and stores a Cow::Borrowed. The latter would still be vulnerable to unexpected mutations though.

ajrcarey · 2022-12-13T21:57:25Z

Yes, a Cow would do the trick. Given the byte buffer could potentially be megabytes in size, I think it's good to be explicit about when copies and allocations are made; for that reason, my instinct is that Cow is not quite the right solution here. It's largely subjective, I suppose, but I would prefer to be explicit about the memory ownership rather than implicit.

I have added a static lifetime requirement to the byte slice passed into load_pdf_from_bytes(), and added a new load_pdf_from_bytes_owned() function that takes an owned Vec. This makes both cases completely explicit and ensures the caller has to think about how they copy or allocate the byte buffer. I think this is the right way to handle it.

I have pushed a commit that makes these small changes. If you could check to make sure they work for you, that'd be great. You will need to add pdfium-render as a git dependency in your Cargo.toml.

Lampyrida · 2022-12-17T21:28:55Z

That sounds very reasonable @ajrcarey. I just tried out the load_pdf_from_bytes_owned function and it works like a charm! Thanks a ton for the very quick turnaround for this! Really appreciate it.

ajrcarey · 2022-12-18T13:57:11Z

Very good. Having thought about it a bit more I'm going to rename the functions slightly to make their use clearer. load_pdf_from_bytes() will remain in place for now but it will be deprecated and removed in 0.9.0 as part of #63, at which point you'll have to shift over to either load_pdf_from_byte_slice() or load_pdf_from_byte_vec(). 0.9.0 is still months away so you've got plenty of time.

These changes will be released as part of 0.7.26. Thanks again for raising this issue and helping me fix this bug.

ajrcarey · 2022-12-18T15:00:22Z

Updated README.md and examples/README.md. Packaged for release as part of 0.7.26.

ajrcarey pushed a commit that referenced this issue Dec 13, 2022

Progressing #63

fe8d521

ajrcarey mentioned this issue Dec 18, 2022

Remove deprecated items as part of release 0.9.0. #36

Open

ajrcarey closed this as completed in 3a51813 Dec 18, 2022

ajrcarey pushed a commit that referenced this issue Dec 18, 2022

Resolves #63

b3ae56b

ajrcarey self-assigned this Mar 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider adding lifetime to byte buffer reference passed into load_pdf_from_bytes #63

Consider adding lifetime to byte buffer reference passed into load_pdf_from_bytes #63

Lampyrida commented Dec 10, 2022

ajrcarey commented Dec 11, 2022

Lampyrida commented Dec 11, 2022

ajrcarey commented Dec 13, 2022

Lampyrida commented Dec 17, 2022

ajrcarey commented Dec 18, 2022

ajrcarey commented Dec 18, 2022

Consider adding lifetime to byte buffer reference passed into load_pdf_from_bytes #63

Consider adding lifetime to byte buffer reference passed into load_pdf_from_bytes #63

Comments

Lampyrida commented Dec 10, 2022

ajrcarey commented Dec 11, 2022

Lampyrida commented Dec 11, 2022

ajrcarey commented Dec 13, 2022

Lampyrida commented Dec 17, 2022

ajrcarey commented Dec 18, 2022

ajrcarey commented Dec 18, 2022