full text search doesn't work #636

Ourithi · 2021-05-25T08:52:41Z

Hello,

I am trying to run full text search with tesseract OCR on an apache2 nextcloud 20.0.10 webserver. The full text search feature works with documents that were already present on the nextcloud server (the nextcloud manual for example) but when i create a new pdf and add it to the server it doesn't work. The tests and first index work fine. There is nothing in the apache error log.
Any ideas?

thomasgg23 · 2021-05-25T12:26:11Z

Is the adding part or the ocr part done via script or via the nextcloud webinterface?

Ourithi · 2021-05-25T13:26:10Z

Is the adding part or the ocr part done via script or via the nextcloud webinterface?

it is done via the nextcloud web interface

Ourithi · 2021-05-25T13:48:41Z

On a side note I meddled here and there and now I get the following error when trying to run the index (after having cleared the indexes table):
"call to a member function getContent() on string in ./files_fulltextsearch/lib/Service/FilesService.php:761"

Ourithi · 2021-05-26T09:10:43Z

I fixed the error but it still doesn't work. I think it's related to the whole full text search having problems with pdf files and all. Does anybody have an idea?

thomasgg23 · 2021-05-26T09:36:05Z

I checked our installation and we dont use the Full Text tesseract script.
We convert the files with a local script and ocrmypdf and then run the occ command to index them again. Works for several months now.

There is an flow "Workflow OCR" which also uses ocrmypdf to convert any new pdf file uploaded. Perhaps you should check this one out.
https://github.com/R0Wi/workflow_ocr

sinichi19 · 2021-05-28T01:24:27Z

check if you have Group Folder feature enable, try to disabled it
#634

ArtificialOwl · 2021-08-10T12:30:12Z

does your PDF contain text layer ?

if yes, check that the PDF option in Admin Settings/Full Text Search is enabled
if no, check the configuration of your tesseract

ArtificialOwl · 2021-08-10T12:30:32Z

please paste result of ./occ fulltextsearch:check

joshtrichards · 2024-03-07T16:18:01Z

Closing due to "needs info" after 3 years. :)

joshtrichards closed this as completed Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

full text search doesn't work #636

full text search doesn't work #636

Ourithi commented May 25, 2021

thomasgg23 commented May 25, 2021

Ourithi commented May 25, 2021

Ourithi commented May 25, 2021

Ourithi commented May 26, 2021

thomasgg23 commented May 26, 2021

sinichi19 commented May 28, 2021

ArtificialOwl commented Aug 10, 2021

ArtificialOwl commented Aug 10, 2021

joshtrichards commented Mar 7, 2024

full text search doesn't work #636

full text search doesn't work #636

Comments

Ourithi commented May 25, 2021

thomasgg23 commented May 25, 2021

Ourithi commented May 25, 2021

Ourithi commented May 25, 2021

Ourithi commented May 26, 2021

thomasgg23 commented May 26, 2021

sinichi19 commented May 28, 2021

ArtificialOwl commented Aug 10, 2021

ArtificialOwl commented Aug 10, 2021

joshtrichards commented Mar 7, 2024