Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

full text search doesn't work #636

Closed
Ourithi opened this issue May 25, 2021 · 9 comments
Closed

full text search doesn't work #636

Ourithi opened this issue May 25, 2021 · 9 comments

Comments

@Ourithi
Copy link

Ourithi commented May 25, 2021

Hello,

I am trying to run full text search with tesseract OCR on an apache2 nextcloud 20.0.10 webserver. The full text search feature works with documents that were already present on the nextcloud server (the nextcloud manual for example) but when i create a new pdf and add it to the server it doesn't work. The tests and first index work fine. There is nothing in the apache error log.
Any ideas?

@thomasgg23
Copy link

Is the adding part or the ocr part done via script or via the nextcloud webinterface?

@Ourithi
Copy link
Author

Ourithi commented May 25, 2021

Is the adding part or the ocr part done via script or via the nextcloud webinterface?

it is done via the nextcloud web interface

@Ourithi
Copy link
Author

Ourithi commented May 25, 2021

On a side note I meddled here and there and now I get the following error when trying to run the index (after having cleared the indexes table):
"call to a member function getContent() on string in ./files_fulltextsearch/lib/Service/FilesService.php:761"

@Ourithi
Copy link
Author

Ourithi commented May 26, 2021

I fixed the error but it still doesn't work. I think it's related to the whole full text search having problems with pdf files and all. Does anybody have an idea?

@thomasgg23
Copy link

I checked our installation and we dont use the Full Text tesseract script.
We convert the files with a local script and ocrmypdf and then run the occ command to index them again. Works for several months now.

There is an flow "Workflow OCR" which also uses ocrmypdf to convert any new pdf file uploaded. Perhaps you should check this one out.
https://github.com/R0Wi/workflow_ocr

@sinichi19
Copy link

check if you have Group Folder feature enable, try to disabled it
#634

@ArtificialOwl
Copy link
Member

does your PDF contain text layer ?

  • if yes, check that the PDF option in Admin Settings/Full Text Search is enabled
  • if no, check the configuration of your tesseract

@ArtificialOwl
Copy link
Member

please paste result of ./occ fulltextsearch:check

@joshtrichards
Copy link
Member

Closing due to "needs info" after 3 years. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants