Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding lifetime to byte buffer reference passed into load_pdf_from_bytes #63

Closed
Lampyrida opened this issue Dec 10, 2022 · 6 comments
Assignees

Comments

@Lampyrida
Copy link

Hi there!

I spent some time tracking down a bug that I tracked down to the following issue:

The load_pdf_from_bytes function wraps FPDF_LoadMemDocument64. FPDF_LoadMemDocument64 requires the byte buffer passed into it to remain valid for as long as the document is kept open. However, this is not enforced in any way by the load_pdf_from_bytes function. I'm wondering if this is something we can address by introducing a lifetime associated with the buffer that ties it to the document.

@ajrcarey
Copy link
Owner

Hi @Lampyrida,

Good to hear from you and thank you for reporting this bug. I'm very sorry about this - this is entirely an oversight on my part.

In addition to the lifetime mismatch, I'm concerned about what might happen if the data in the byte buffer was mutated "behind Pdfium's back", so to speak. (This would require unsafe code, but it's not impossible.) The safest way of guaranteeing both lifetime validity and data immutability is for PdfDocument to take ownership of the byte buffer itself, rather than taking a reference. This means the Pdfium::load_pdf_from_bytes() function signature would change to:

pub fn load_pdf_from_bytes(&self, bytes: Vec<u8>, password: Option<&str>)

This would be less convenient when working with byte buffers loaded via the include!() macro, however.

Let me think about this over the next day or two. I'd welcome any further thoughts you have.

@Lampyrida
Copy link
Author

That's all good! I appreciate all the effort you have put into this library in the first place.

You bring up a great point regarding the risk of mutations also. Having the PdfDocument take ownership of the the byte buffer would certainly be ideal for my current use case.

I'm still somewhat new to Rust so not 100% sure this would work, but perhaps we can support both ownership and references by having PdfDocument store a Cow<'a, [u8]>? That could then be combined with two Pdfium::load_pdf_from_bytes() variants: one that takes ownership of the byte buffer, and internally creates a Cow::Owned, and another that takes a reference with appropriate lifetime and stores a Cow::Borrowed. The latter would still be vulnerable to unexpected mutations though.

ajrcarey pushed a commit that referenced this issue Dec 13, 2022
@ajrcarey
Copy link
Owner

Yes, a Cow would do the trick. Given the byte buffer could potentially be megabytes in size, I think it's good to be explicit about when copies and allocations are made; for that reason, my instinct is that Cow is not quite the right solution here. It's largely subjective, I suppose, but I would prefer to be explicit about the memory ownership rather than implicit.

I have added a static lifetime requirement to the byte slice passed into load_pdf_from_bytes(), and added a new load_pdf_from_bytes_owned() function that takes an owned Vec. This makes both cases completely explicit and ensures the caller has to think about how they copy or allocate the byte buffer. I think this is the right way to handle it.

I have pushed a commit that makes these small changes. If you could check to make sure they work for you, that'd be great. You will need to add pdfium-render as a git dependency in your Cargo.toml.

@Lampyrida
Copy link
Author

That sounds very reasonable @ajrcarey. I just tried out the load_pdf_from_bytes_owned function and it works like a charm! Thanks a ton for the very quick turnaround for this! Really appreciate it.

@ajrcarey
Copy link
Owner

Very good. Having thought about it a bit more I'm going to rename the functions slightly to make their use clearer. load_pdf_from_bytes() will remain in place for now but it will be deprecated and removed in 0.9.0 as part of #63, at which point you'll have to shift over to either load_pdf_from_byte_slice() or load_pdf_from_byte_vec(). 0.9.0 is still months away so you've got plenty of time.

These changes will be released as part of 0.7.26. Thanks again for raising this issue and helping me fix this bug.

@ajrcarey
Copy link
Owner

Updated README.md and examples/README.md. Packaged for release as part of 0.7.26.

ajrcarey pushed a commit that referenced this issue Dec 18, 2022
@ajrcarey ajrcarey self-assigned this Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants