Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Find Controller? #2339

Closed
Hypercubed opened this issue May 21, 2024 · 51 comments
Closed

Custom Find Controller? #2339

Hypercubed opened this issue May 21, 2024 · 51 comments
Assignees
Labels
20 scheduled for version 20 21 enhancement New feature or request Loving it! Sometimes I have to decline a feature I'd love to implement. Let's give these features some love! Solved

Comments

@Hypercubed
Copy link

Hypercubed commented May 21, 2024

I have a need to add regex search to the find toolbar. I think I have a clear understanding of ngx-extended-pdf-viewer custom components; etc. I can create custom components that I need but the search functionality itself is buried deep in pdfjs; specifically private methods in PDFFindController.

Options:

  1. Add this full functionality to ngx-extended-pdf-viewer and stephanrauh/pdf.js as PRs to both projects.
  2. Add private functionality to stephanrauh/pdf.js and create custom component on my end to expose it.
  3. Modify ngx-extended-pdf-viewer to allow overriding PDFFindController so that I can maintain my own "fork" of just that class.
  4. Modify stephanrauh/pdf.js to make all private methods public; allowing me to override methods in pdfViewer.findController on my end.
  5. Maintain my own fork of stephanrauh/pdf.js.

I feel that # 4 would be the least impactful to your projects; but obviously will add some complexity merging upstream pdfjs changes. I'm willing an able to do the code changes myself... but would like to know if you have any thoughts on the best way to proceed.

Thank you.

@stephanrauh stephanrauh self-assigned this May 22, 2024
@stephanrauh
Copy link
Owner

I assume you want to benefit from improvements of pdf.js, so anything involving a fork sounds painful. I only manage because I merge the new changes every week.

I wonder if you can implement a system to add plugins to the findbar. That's similar to option 4, but instead of overwriting methods you define user hooks to add custom code. If you're ready to send me such a PR, I'll happily merge it. I guess it's a sensible compromise between risking merge hell and adding flexibility to the library.

BTW, you regex search might be interesting to other users, too. What about adding it to a demo page of the showcase?

@stephanrauh stephanrauh added enhancement New feature or request waiting for the user's answer Loving it! Sometimes I have to decline a feature I'd love to implement. Let's give these features some love! labels May 22, 2024
@stephanrauh
Copy link
Owner

Come to think of it, the plugins might bring back fuzzy search and searching multiple terms. I had to abandon these improvements some time ago because I ran into the merge hell. It would be nice to get them back!

@Hypercubed
Copy link
Author

Hypercubed commented May 22, 2024

That's basically what I want... regex search and multiple terms. I basically only need to add one line in /web/pdf_find_controller.js but once I do that I'm forced to "fork" pdf_find_controller.js, pdf_find_utils.js and ui_utils.js. I then need to "overwrite" the instance of findController in each page. Here is my code so far:

import type { PDFPageView } from 'ngx-extended-pdf-viewer';
import { PDFFindController } from './pdfjs_overrides/pdf_find_controller';

export function customFindControllerFactor() {
  const PDFViewerApplication = (window as any).PDFViewerApplication;
  const eventBus = PDFViewerApplication.eventBus;
  const linkService = PDFViewerApplication.pdfLinkService;

  // Create a custom FindController
  const findController = new PDFFindController({ eventBus, linkService } as any);

   // Replace findController on each page
  eventBus.on('pagesloaded', () => {
    findController.setDocument(PDFViewerApplication.pdfDocument);
    const pages: PDFPageView[] = (PDFViewerApplication.pdfViewer as any)._pages;
    pages.forEach(page => {
      (page as any)._textHighlighter.findController = findController;
    });
  });
  return findController;
}

Not very clean or maintainable. If we were to write a plugin system I'm guessing it would be almost the same thing. Something to override the built in PDFFindController.

Edit: And this to overide the instance in the pdfjs application:

  pdfLoaded() {
    const findController = customFindControllerFactor();
    this.pdfViewerApplication.findController = findController as unknown as FindController;
  }

@Hypercubed
Copy link
Author

Hypercubed commented May 23, 2024

I'm thinking of this approach. Both would be changes to stephanrauh/pdf.js that would allow ngx-extended-pdf-viewer (or end users) to "plug" the find feature.

  1. Update all methods in /web/pdf_find_controller.js (in stephanrauh/pdf.js) to be public (replace this.#methods with this._method. If the methods are public it would be easy to create a new class that extends PDFFindController. The developer extending PDFFindController doesn't need to override every method since we can call super._method in the base class.
  2. Update PDFViewerApplication._initializeViewerComponents to look for constructor overrides for PDFFindController in AppOptions (or similar). That would allow end users to replace PDFFindController with the extended class without jumping through the hoops of iterating over all the pages like I did above.

Not quite a plugins... but close. You could resonably do this to other objects as well.

Here is what I would guss my "plugin" would look like:

class CustomPDFFindController extends PDFFindController {
  _convertToRegExpString(query, hasDiacritics) {
    const {  matchRegex } = this._state;
    if (matchRegex) return [false, query];
    return super._convertToRegExpString(query, hasDiacritics);
  }
}

Registered something like:

PDFViewerApplicationOptions.set('FindControllerConstructorOveride', CustomPDFFindController);

@stephanrauh
Copy link
Owner

stephanrauh commented May 23, 2024

Yes, that's an option. You're definitely on the right track. It's just that I'd like to find a minimally invasive approach. And it seems I'm less shy about manipulating prototypes. :)

customQueryConversion(query) {
  return query; // idempotent default implementation
}

#calculateMatch(pageIndex) {
   let query = this.#query;
    if (query.length === 0) {
      return; // Do nothing: the matches should be wiped out already.
    }
    const { caseSensitive, entireWord } = this.#state;
    const pageContent = this._pageContents[pageIndex];
    const hasDiacritics = this._hasDiacritics[pageIndex];

    let isUnicode = false;

    // allow users to add their plugins <<<<<<<<<
    query = customQueryConversion(query);
    // end of the modification <<<<<<<<<<<<<<<
    if (typeof query === "string") {
      [isUnicode, query] = this.#convertToRegExpString(query, hasDiacritics);
    } else {
...

and

(PDFViewerApplication as any).pdfViewer.findController.customQueryConversion = () => { /* put your magic here */ };

How do you like the idea?

@Hypercubed
Copy link
Author

Hypercubed commented May 23, 2024

That might work in some cases... but in my case I need to conditionally skip #convertToRegExpString. I'd either need a way to say use only customQueryConversion in some cases, skiping the #convertToRegExpString or be able to call #convertToRegExpString within customQueryConversion; which is not possible for privates.

@Hypercubed
Copy link
Author

If we wanted that it would have to be something like:


customQueryConversion(query) {
  return false; // idempotent default implementation
}

#calculateMatch(pageIndex) {
  ...

  customQuery = customQueryConversion(query);
  if (customQuery !== false) {
    // customQuery is not false, use this
    query = customQuery;
  } else {
    // otherwise do the normal stuff
    let isUnicode = false;
    if (typeof query === "string") {
    ...
  }

  // then do everything else
  ...

then

(PDFViewerApplication as any).pdfViewer.findController.customQueryConversion = () => { 
  const { matchRegex } = this.state;
  if (!matchRegex) return false;
  /* put your magic here */
};

@stephanrauh
Copy link
Owner

That shouldn't be a problem - customQueryConversion can return more than one value:

customQueryConversion(query) {
  return {
     skipDefaultBehavior: false;
     query; // idempotent default implementation
  }
}

#calculateMatch(pageIndex) {
   ...
    // allow users to add their plugins <<<<<<<<<
    { query,  skipDefaultBehavior} = customQueryConversion(query);
    if (skipDefaultBehavior) {
       // do nothing;
    }     
    else if (typeof query === "string") { // end of the modification <<<<<<<<<<<<<<<

@stephanrauh
Copy link
Owner

stephanrauh commented May 23, 2024

We've sent our messages simultaneously - and we're thinking along similar lines!

@stephanrauh
Copy link
Owner

stephanrauh commented May 23, 2024

I suppose my approach isn't perfect yet. But I hope we can polish it until it's useful to support most (or every?) use-case. Things we still need to consider (or at least I should consider it, I'm not familiar with your application):

  • users should be able to activate / deactivate custom options in the find bar, so the custom parameters need to be sent
  • the find bar should be configurable. Actually, it already is, but there's not documented way to pass custom parameters to the FindController.

@Hypercubed
Copy link
Author

Hypercubed commented May 23, 2024

I also might have some modifications to #calculateRegExpMatch. Obviously I'll do whatever you think is best for your maintance. I'd be happy if I can just do this. Notice that i'm putting matchRegex into the findController state. That's easy enough to do when I dispatch "find".

const originalCalculateMatch = findController._calculateMatch;  // only works if _calculateMatch is public
findController._calculateMatch = function (query) {
  const { matchRegex } = this.state;
  if (!matchRegex) return originalCalculateMatch(query);
  /* put your magic here */
};

const originalCalculateRegExpMatch = findController._calculateRegExpMatch;
findController._calculateRegExpMatch = function (query) {
  const { matchRegex } = this.state;
  if (!matchRegex) return originalCalculateRegExpMatch(query);
  /* put your magic here */
};

Then the only change would be the search and replace #calculateMatch -> _calculateMatch and same for #calculateRegExpMatch.

Might need a few .bind() here and there...

@Hypercubed
Copy link
Author

FYI.. overriding the find bar is working great. Adding the "matchRegex" buttons is not too difficult:

image

@Hypercubed
Copy link
Author

Hypercubed commented May 23, 2024

My override of calculateMatch is not right... but you get the idea.

Edit:

This is what I need:

    const originalConvertToRegExpString = findController._convertToRegExpString.bind(findController);  // only works if _convertToRegExpString is public
    findController._convertToRegExpString = function (query: any) {
      const { matchRegex } = this.state;
      if (matchRegex) return [false, query];
      return originalConvertToRegExpString(query);
    };

@Hypercubed
Copy link
Author

@stephanrauh Let me know if the PR to pdfjs works for now. I tested locally and allowed me to override the convertToRegExpString method as shown. If that works, and once it's merged, I can try to add a demo to ngx-extended-pdf-viewer.

@stephanrauh
Copy link
Owner

Yes, everything seems to work fine. Thanks!

@stephanrauh stephanrauh added Solved 20 scheduled for version 20 and removed waiting for the user's answer labels May 24, 2024
@Hypercubed
Copy link
Author

Do we also need to apply these changes to the non-bleeding edge version?

@stephanrauh
Copy link
Owner

I already did. I just didn't push my changes yet. Currently, I'm merging the latest changes of pdf.js to the bleeding edge, and like so often, that's a bit painful. :)

stephanrauh added a commit to stephanrauh/pdf.js that referenced this issue May 24, 2024
stephanrauh added a commit to stephanrauh/pdf.js that referenced this issue May 24, 2024
stephanrauh added a commit that referenced this issue May 24, 2024
…ntroller public to allow users to add custom implementations; #2343 fixed the box-sizing model of the annotation layer (was broken by #2282)
@stephanrauh
Copy link
Owner

Your improvement has landed with version 20.1.0. Now I'm looking forward to your demo.

Enjoy!
Stephan

@rmdorsey
Copy link

Regarding the find by regex feature; Is there a way to prevent the viewer from doing an auto jump to the page where the search result is?

@stephanrauh
Copy link
Owner

Currently not - but I think it should be easy to implement this feature.

@rmdorsey
Copy link

rmdorsey commented Aug 6, 2024

Not yet. At the moment, I just feel I ought to improve the find controller and the programmatic API and open it for customer hooks. But I don't know yet if that's possible, how to do it, and if it's a good idea in the long run. I'm 90% sure I can make you guys happy, but there's no guarantee, nor is there a timetable.

Hi Stephan, thanks again for all you're doing. Just checking in to see if there's an update regarding this feature request.
Thanks,
Matt

@stephanrauh
Copy link
Owner

Not really, but I've finally managed to publish the huge refactoring called version 21. This, in turn, means I can now afford time to your ticket.

@stephanrauh
Copy link
Owner

You might be interested in the changes I've added in version 21.2.0. It's not precisely what you're looking for, but it already gives you new options.

@Hypercubed
Copy link
Author

@stephanrauh That's very cool. Is the customFindController a seperate instance of PDFjs's FindController or something maintained in ngx-extended-pdf-viewer?

@stephanrauh
Copy link
Owner

It's a separate instance. Basically, I simply duplicated the line findController=new PdfFindController().

stephanrauh added a commit to stephanrauh/pdf.js that referenced this issue Aug 14, 2024
stephanrauh added a commit to stephanrauh/extended-pdf-viewer-showcase that referenced this issue Aug 14, 2024
@stephanrauh
Copy link
Owner

@Hypercubed @rmdorsey I believe I've found a nice solution. Would you like to define your own custom find controller, without being restricted by the pdf.js implementation? As it turns out that's possible. The API of PDFFindController is suprisingly small. Basically, it only reacts to a few events and it doesn't offer any public API.

So it's easy to write your own find controller. I assume most people prefer to extend or modify the original find controller, and this is also possible. My prototype stephanrauh/extended-pdf-viewer-showcase@aebdf6b shows the idea.

How do you like it?

stephanrauh added a commit to stephanrauh/pdf.js that referenced this issue Aug 15, 2024
@Hypercubed
Copy link
Author

Mostly just curious. I'm still doing the overide of the methods we exposed and using the existing UI.

@stephanrauh
Copy link
Owner

Well, that's still possible, and there's nothing wrong with that. What I like about my idea is that it opens the way to a find controller that's written in TypeScript and integrates seamlessly into Angular.

stephanrauh added a commit to stephanrauh/pdf.js that referenced this issue Aug 17, 2024
stephanrauh added a commit to stephanrauh/extended-pdf-viewer-showcase that referenced this issue Aug 17, 2024
…df-viewer#2507 improved the demo - now it delays the second search until the primary find controller has found a result
stephanrauh added a commit to stephanrauh/pdf.js that referenced this issue Aug 17, 2024
@stephanrauh
Copy link
Owner

Version 21.3.0 has landed, offering you the option to define your own custom controller.

I'm pretty sure the document needs a lot of improvement. The current version is here: https://pdfviewer.net/extended-pdf-viewer/custom-find.

Enjoy!
Stephan

@rmdorsey
Copy link

Hi Stephan, thanks for this!

I'm trying to use it now by creating a new angular app and bringing in the newest version of the player.

I've set things up like you have here in your showcase repo: https://github.com/stephanrauh/extended-pdf-viewer-showcase/tree/main/src/app/extended-pdf-viewer/custom-find

Your Angular skills are very advanced... I'm wondering if you can give me any pointers regarding these errors I'm seeing...

image

@stephanrauh
Copy link
Owner

I doubt my Angular skills are advanced - for example, I've never used NgRX store till now :)

I suspect the difference between our projects is this setting in the tsconfig.json file:

    "esModuleInterop": true,

stephanrauh added a commit to stephanrauh/pdf.js that referenced this issue Aug 24, 2024
…f-viewer#2524 the PDF viewer didn't scroll find matches reliably into view
stephanrauh added a commit to stephanrauh/pdf.js that referenced this issue Aug 24, 2024
…f-viewer#2524 the PDF viewer didn't scroll find matches reliably into view
stephanrauh added a commit to stephanrauh/extended-pdf-viewer-showcase that referenced this issue Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
20 scheduled for version 20 21 enhancement New feature or request Loving it! Sometimes I have to decline a feature I'd love to implement. Let's give these features some love! Solved
Projects
None yet
Development

No branches or pull requests

3 participants