'still to go' numbers are HUGE: the lucene indexer seems to be stuck in a land where even at 1 task, the OCR jobs take 100% CPU #129

GerHobbelt · 2019-11-02T12:40:34Z

'still to go' numbers are HUGE: the lucene indexer seems to be stuck in a land where even at 1 task, the OCR jobs take 100% CPU. See if we can get lucene to work a little harder for us, maybe a task priority thing? --> IncrementalBuildIndex() is executed only once every minute. That's not gonna fly when a lot of pages are textified.

Cave Canem: is Lucene/indexing the real problem? Profiling Qiqqa has been a bit of nightmare so data is yet inconclusive (DevStudio breakage/extreme delays in processing profiling tracks)

The text was updated successfully, but these errors were encountered:

GerHobbelt · 2019-11-02T12:54:48Z

Logging has been augmented in v82pre4 to facilitate the investigation of this issue. Still needs #95 side-effect of robust STDERR+STDOUT streaming to move forward on this bugger.

GerHobbelt · 2019-11-05T19:43:32Z

Related: #127. Two problems with the same (suspected) root cause.

…-open-source#129, jimmejardine/qiqqa-open-source#127, jimmejardine/qiqqa-open-source#135)

… SINGLE don't deliver due to, for example, encrypted PDF source. This is a temporary hack to ensure Qiqqa doesn't repeat OCR activities ad nauseam (jimmejardine#129 , jimmejardine#135 , jimmejardine#73 , etc.) - the previously added extra OCR text files' sanity checks (zero-sized areas of words, etc.) seems to pay off. At least we've observed quite a few OCR files/pages being retriggered for OCR as Qiqqa uncovers these zero-sized word areas while refreshing for Expeditions - added a few more UI-thread-or-not Assertions.

…he current OCR engine! (jimmejardine#73 + jimmejardine#129 + jimmejardine#135 )

GerHobbelt created this issue from a note in TODO list (To do) Nov 2, 2019

GerHobbelt added 🕵investigate Needs further analysis to find the root cause. 🤔question Further information is requested or this is a support question labels Nov 2, 2019

GerHobbelt added this to the v82 milestone Nov 2, 2019

GerHobbelt moved this from To do to In progress in TODO list Nov 2, 2019

GerHobbelt added this to To do in v82release Nov 3, 2019

This was referenced Nov 3, 2019

chop long running many-file add-to-library actions into short segments which are flushed to disk: prevent lots of manual redo due to (unwanted) Qiqqa crash #99

Open

QiqqaOCR: internal region selector logic is still b0rked #135

Closed

GerHobbelt moved this from To do to In progress in v82release Nov 7, 2019

GerHobbelt added a commit to GerHobbelt/Evil-PDF-Library-for-Qiqqa that referenced this issue Nov 14, 2019

add PDF documents which exhibit severe OCR issues (jimmejardine/qiqqa…

8a01915

…-open-source#129, jimmejardine/qiqqa-open-source#127, jimmejardine/qiqqa-open-source#135)

GerHobbelt added a commit to GerHobbelt/qiqqa-open-source that referenced this issue Mar 23, 2020

QiqqaOCR: fake words for empty pages or any pages that appear so to t…

b0191cd

…he current OCR engine! (jimmejardine#73 + jimmejardine#129 + jimmejardine#135 )

SimonDedman mentioned this issue Feb 10, 2023

Retain Textify, OCR progress, etc, in status bar? #409

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'still to go' numbers are HUGE: the lucene indexer seems to be stuck in a land where even at 1 task, the OCR jobs take 100% CPU #129

'still to go' numbers are HUGE: the lucene indexer seems to be stuck in a land where even at 1 task, the OCR jobs take 100% CPU #129

GerHobbelt commented Nov 2, 2019 •

edited

Loading

GerHobbelt commented Nov 2, 2019

GerHobbelt commented Nov 5, 2019 •

edited

Loading

'still to go' numbers are HUGE: the lucene indexer seems to be stuck in a land where even at 1 task, the OCR jobs take 100% CPU #129

'still to go' numbers are HUGE: the lucene indexer seems to be stuck in a land where even at 1 task, the OCR jobs take 100% CPU #129

Comments

GerHobbelt commented Nov 2, 2019 • edited Loading

GerHobbelt commented Nov 2, 2019

GerHobbelt commented Nov 5, 2019 • edited Loading

GerHobbelt commented Nov 2, 2019 •

edited

Loading

GerHobbelt commented Nov 5, 2019 •

edited

Loading