Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PdfPageText::search returns 0 results for text_search example (WASM) #154

Closed
synoet opened this issue Aug 10, 2024 · 4 comments
Closed

PdfPageText::search returns 0 results for text_search example (WASM) #154

synoet opened this issue Aug 10, 2024 · 4 comments
Assignees

Comments

@synoet
Copy link

synoet commented Aug 10, 2024

I'm encountering an issue with the PdfPageText::search method when compiling to wasm and linking against recent wasm versions of pdfium-lib releases.(I've tried from V5668 to 6276)

Steps to reproduce:

  1. Compiled pdfium-render linked against any recent wasm pdfium-lib release
  2. Copying the logic from text_search example with only modification being passing in the pdf buffer through javascript.
  3. Returns 0 search results.

When compiling on linux, linked against [pdfium-binaries aur arch linux] , the text_search example returns the correct number of results.

Not sure if this is the best place to ask, might be an issue with the recent wasm pdfium-lib releases and not with pdfium-render, but thought I would bring it up If you might have any ideas as to why its not working as expected.

Any insights or suggestions would be greatly appreciated !

@ajrcarey ajrcarey self-assigned this Aug 11, 2024
@ajrcarey
Copy link
Owner

Hi @synoet , thank you for reporting the issue. I can reproduce the problem using the following WASM function based on the text_search example:

#[cfg(target_arch = "wasm32")]
#[wasm_bindgen]
pub async fn text_search_test(url: String) {
    // For general comments about pdfium-render and binding to Pdfium, see export.rs.

    let search_term = "French";

    let search_options = PdfSearchOptions::new()
        // Experiment with how the search results change when uncommenting
        // the following search options.

        // .match_whole_word(true)
        // .match_case(true)
        ;

    // Find the position of all occurrences of the search term
    // on the first page of the target document.

    let pdfium = Pdfium::default();

    let document = pdfium.load_pdf_from_fetch(url, None).await.unwrap();

    let page = document.pages().first().unwrap();

    let search_results_bounds = page
        .text()
        .unwrap()
        .search(search_term, &search_options)
        .iter(PdfSearchDirection::SearchForward)
        .enumerate()
        .flat_map(|(index, segments)| {
            segments
                .iter()
                .map(|segment| {
                    log::info!(
                        "Search result {}: `{}` appears at {:#?}",
                        index,
                        segment.text(),
                        segment.bounds()
                    );

                    segment.bounds()
                })
                .collect::<Vec<_>>()
        })
        .collect::<Vec<_>>();

    log::info!("{} search results", search_results_bounds.len());
}

When compiled to WASM and executed from a webpage, the output 0 search results is logged to the Javascript console. Based on the output of the text_search example, I would expect to see 5 search results.

@ajrcarey
Copy link
Owner

ajrcarey commented Aug 11, 2024

The problem is a mishandling of the FPDF_WIDESTRING pointer type in the WASM implementation of the FPDFText_FindStart() function. When corrected, the output is:

 Search result 0: `French ` appears at PdfRect {
    bottom: PdfPoints {
        value: 329.72476,
    },
    left: PdfPoints {
        value: 361.7285,
    },
    top: PdfPoints {
        value: 337.8637,
    },
    right: PdfPoints {
        value: 394.82318,
    },
}
Search result 1: `French ` appears at PdfRect {
    bottom: PdfPoints {
        value: 288.63016,
    },
    left: PdfPoints {
        value: 249.18015,
    },
    top: PdfPoints {
        value: 296.7691,
    },
    right: PdfPoints {
        value: 282.27484,
    },
}
Search result 2: `French ` appears at PdfRect {
    bottom: PdfPoints {
        value: 268.03287,
    },
    left: PdfPoints {
        value: 269.67044,
    },
    top: PdfPoints {
        value: 276.1718,
    },
    right: PdfPoints {
        value: 302.76514,
    },
}
Search result 3: `French` appears at PdfRect {
    bottom: PdfPoints {
        value: 247.53555,
    },
    left: PdfPoints {
        value: 222.95956,
    },
    top: PdfPoints {
        value: 255.6745,
    },
    right: PdfPoints {
        value: 256.05423,
    },
}
Search result 4: `French ` appears at PdfRect {
    bottom: PdfPoints {
        value: 103.65446,
    },
    left: PdfPoints {
        value: 166.09702,
    },
    top: PdfPoints {
        value: 111.79339,
    },
    right: PdfPoints {
        value: 199.19168,
    },
}
5 search results

This gives the correct number of search results and output that matches the text_search example.

Preparing a patch now.

@ajrcarey
Copy link
Owner

ajrcarey commented Aug 11, 2024

Pushed fix to FPDF_WIDESTRING handling in WASM bindings. Fix will be included in crate release 0.8.24. In the meantime, you can take pdfium-render as a git dependency in your Cargo.toml to test the fix.

ajrcarey pushed a commit that referenced this issue Aug 11, 2024
@synoet
Copy link
Author

synoet commented Aug 31, 2024

Thank you very much! It works as expected now.

And thank you so much for your work on this project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants