Remove PdfPages::delete_page_at_index() in favour of PdfPage::delete(). #67

ajrcarey · 2023-02-04T19:40:19Z

If a reference to a PdfPage is retrieved using PdfPages::get(), then that page is deleted from the document using PdfPages::delete_page_at_index(), we still possess a (now invalid) PdfPage reference. This is effectively a use-after-free violation. Avoid this by removing the PdfPages::delete_page_at_index() function and instead use a PdfPage::delete() function that consumes mut self. This ensures the caller cannot hold an invalid reference to a page after it is deleted.

The text was updated successfully, but these errors were encountered:

ajrcarey · 2023-02-04T19:42:36Z

Added PdfPages::delete_page_at_index() to deprecated items list as part of #36.

ajrcarey · 2023-02-05T14:27:47Z

Added PdfPages::delete_page_range() to deprecated items list as part of #36.

ajrcarey · 2023-02-05T20:30:57Z

Actually, if PdfDocument::pages() were to return a &PdfPages / &mut PdfPages reference (rather than the owned PdfPages instance it returns at the moment), and if PdfPages::get() were to return a &PdfPage / &mut PdfPage reference (rather than the owned PdfPage instance it returns at the moment), then it might be safe to reinstate these functions, as Rust would be able to manage the lifetimes safely. This could be done as part of #47. For now, though, deprecating the functions is the safe move.

ajrcarey · 2023-02-05T20:58:34Z

Deprecated functions in PdfPages. Added PdfPage::delete() function, taking advantage of new PdfPageIndexCache introduced in #60. Removed references to deprecated functions from examples/concat.rs and examples/copy_objects.rs. Removed reference to deprecated function from PdfPageIndexCache unit tests. Updated documentation. Package for inclusion in release 0.7.30.

mara004 · 2023-03-23T15:54:44Z

As PDFium only provides a document-level page deleter, how did you implement this?
How can you be sure to get the right index for a PdfPage?

see also ajrcarey/pdfium-render#67

ajrcarey · 2023-03-23T16:54:53Z

Page indices of each FPDF_PAGE in each open FPDF_DOCUMENT are cached as the page references are instantiated by Pdfium, either when a caller retrieves a page or creates a new page. This cache of page indices is used from within PdfPage::delete(). The implementation is at https://github.com/ajrcarey/pdfium-render/blob/master/src/page_index_cache.rs.

mara004 · 2023-03-23T19:25:55Z

I see. We can't do this in pypdfium2, as we also expose the raw pdfium API, so the caller might do operations that change page indices without helper classes knowing.
In general, I would find it safer if pdfium could just provide something like your get_index_for_page() natively to reduce the risk of a page index tracker getting out of sync.

ajrcarey · 2023-03-23T19:48:32Z

I suspect it would be difficult to manage in a garbage-collected language because the point at which references are dropped is likely non-deterministic, making it difficult to know when to update any cache. Rust's memory model proves advantageous in this case.

mara004 · 2023-03-23T20:19:40Z

Sorry, I don't understand yet. What has this got to do with object references?
Apart from that, attaching code to be run on garbage collection of an object is possible in Python.

ajrcarey · 2023-03-23T20:36:06Z

When a caller calls FPDF_ClosePage() (or whatever the function is called), any reference to that FPDF_PAGE object that is still held by any other caller is no longer valid (a "dangling" reference). Garbage collection is not an ideal way to determine when to tidy up these dangling references, because there will be a non-deterministic delay between invalidating the reference and dropping any held copies.

However, like you say, you expose the raw Pdfium API, so the point is somewhat moot as I imagine you probably don't have a way of intercepting when callers are creating and releasing FPDF_* object references.

mara004 · 2023-03-24T14:07:31Z

Yes, the non-deterministic nature of garbage collection is a problem, see this section of pypdfium2's readme.

However, I think we're kind of drifting away from the initial point. What I meant was just that a page object -> index mapping can be slightly dangerous if not implemented natively in PDFium. If there is any change to page order the mapping fails to track, then page.delete() might remove a different page than intended. It will be OK if your code is correct and you do not expose the raw API, though. (Except that it also adds additional, hidden complexity/expense on top of pdfium's API.)

In pypdfium2 this is something we cannot do, as callers are encouraged to use helpers in conjunction with the raw API, and the raw API may potentially be used to do arbitrary modifications to page order.

There's not much point in attempting to intercept raw API calls, because (a) we want to make the raw API available to the caller as-is (it would be confusing if calling a raw function would implicitly trigger something) (b) new APIs that change page order might be added to pdfium, so a page cache would imply the same risks, and (c) the purpose does not justify the complexity.

ajrcarey self-assigned this Feb 4, 2023

ajrcarey mentioned this issue Feb 4, 2023

Remove deprecated items as part of release 0.9.0. #36

Open

ajrcarey mentioned this issue Feb 5, 2023

Standardise lifetime and reference handling for child objects inside PdfDocument. #47

Closed

ajrcarey closed this as completed in 6aa7c09 Feb 5, 2023

mara004 added a commit to pypdfium2-team/pypdfium2 that referenced this issue Mar 23, 2023

document: add task

c519024

see also ajrcarey/pdfium-render#67

mara004 mentioned this issue Mar 25, 2023

Consider adding weakref-based page cache to PdfDocument pypdfium2-team/pypdfium2#185

Closed

mara004 mentioned this issue Apr 11, 2023

How to create a new pdf from a list of PdfPage? pypdfium2-team/pypdfium2#205

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove PdfPages::delete_page_at_index() in favour of PdfPage::delete(). #67

Remove PdfPages::delete_page_at_index() in favour of PdfPage::delete(). #67

ajrcarey commented Feb 4, 2023

ajrcarey commented Feb 4, 2023

ajrcarey commented Feb 5, 2023

ajrcarey commented Feb 5, 2023

ajrcarey commented Feb 5, 2023

mara004 commented Mar 23, 2023

ajrcarey commented Mar 23, 2023

mara004 commented Mar 23, 2023 •

edited

Loading

ajrcarey commented Mar 23, 2023 •

edited

Loading

mara004 commented Mar 23, 2023

ajrcarey commented Mar 23, 2023

mara004 commented Mar 24, 2023 •

edited

Loading

Remove PdfPages::delete_page_at_index() in favour of PdfPage::delete(). #67

Remove PdfPages::delete_page_at_index() in favour of PdfPage::delete(). #67

Comments

ajrcarey commented Feb 4, 2023

ajrcarey commented Feb 4, 2023

ajrcarey commented Feb 5, 2023

ajrcarey commented Feb 5, 2023

ajrcarey commented Feb 5, 2023

mara004 commented Mar 23, 2023

ajrcarey commented Mar 23, 2023

mara004 commented Mar 23, 2023 • edited Loading

ajrcarey commented Mar 23, 2023 • edited Loading

mara004 commented Mar 23, 2023

ajrcarey commented Mar 23, 2023

mara004 commented Mar 24, 2023 • edited Loading

mara004 commented Mar 23, 2023 •

edited

Loading

ajrcarey commented Mar 23, 2023 •

edited

Loading

mara004 commented Mar 24, 2023 •

edited

Loading