Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tesseract 4.0.0-1.4.4 crashes on Mac OS #694

Closed
maximumspatium opened this issue Feb 25, 2019 · 10 comments
Closed

tesseract 4.0.0-1.4.4 crashes on Mac OS #694

maximumspatium opened this issue Feb 25, 2019 · 10 comments
Labels

Comments

@maximumspatium
Copy link

maximumspatium commented Feb 25, 2019

Hello Samuel,

the Audiveris project uses Javacpp since 2012 to access Tesseract.
Today I decided to switch to the recent Tesseract version that end up in the following JVM crash:

#
# A fatal error has been detected by the Java Runtime Environment:
#
sscanf(line, "%" QUOTED_TOKENSIZE "s %" QUOTED_TOKENSIZE "s %f %f", linear_token, essential_token, &ParamDesc[i].Min, &ParamDesc[i].Max) == 4:Error:Assert failed:in file clusttool.cpp, line 73
#  SIGILL (0x4) at pc=0x0000000124e73fd4, pid=15607, tid=0x000000000000b877
#
# JRE version: Java(TM) SE Runtime Environment (8.0_202-b08) (build 1.8.0_202-b08)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.202-b08 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C  [libtesseract.4.dylib+0x21bfd4]  ERRCODE::error(char const*, TessErrorLogCode, char const*, ...) const+0x174
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/maxim/Documents/Development/audiveris-ng/hs_err_pid15607.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

After searching for a while in the Tesseract issue tracker, I learned that this issue is probably caused by a wrong locale. So I've tried to run our software as follows:

LC_ALL=C ./gradlew run

It turned out to actually work.

I also tried the basic example here - it crashes as well when being invoked without LC_ALL=C.

That all reminds me of that "Illegal min or max specification" issue back to 2014. In those days we were repeatedly told that setting the right locale is user's responsibility, the Tesseract project doesn't maintain 3rd party Java wrappers and so on...

This issue has been finally fixed in SVN r1064 (now tesseract-ocr/tesseract@3a5f699) by replacing fscanf with the custom, locale-insensitive tfscanf.

This fix has been kept across 3.x releases but the introduction of the locale-sensitive sscanf in tesseract-ocr/tesseract@1cc5111 makes all 4.x releases to suffer from the same issue as 2014 again, at least from the Java's POV!

It looks like our project has currently two inconvenient options:

  • stuck with Tesseract 3.x which is difficult because operating systems and packages involve quickly
  • run audiveris from terminal to be able to switch the locale

I could file an issue on the Tesseract issue tracker but my bet is that I'll merely get an answer similar to this one: tesseract-ocr/tesseract#1010 (comment)

Do you suddenly have any idea what can be done on the Javacpp side to mitigate the problem?

Thanks a ton in advance!

P.S.: I'm running on Mac OS 11.13 (german) and JDK 1.8.0 202u...

@saudet
Copy link
Member

saudet commented Feb 26, 2019

Send a patch as a pull request and I'll merge it! Thanks

@saudet
Copy link
Member

saudet commented Apr 3, 2019

I've included the setlocale() function in commit 5bc74d8 as part of the presets to work around this issue more easily and reliably than this: nguyenq/tess4j#106 (comment)

Please give it a try with the snapshots: http://bytedeco.org/builds/

@maximumspatium
Copy link
Author

Please give it a try with the snapshots: http://bytedeco.org/builds/

Thank you a lot! I'll try it tonight and report results.

@saudet
Copy link
Member

saudet commented Apr 11, 2019

Workaround included in version 1.5. Thanks for reporting!

@saudet saudet closed this as completed Apr 11, 2019
@maximumspatium
Copy link
Author

Hello Samuel,

I'm sorry for buzzing you again regarding this issue but today I've got another Tesseract crash:

sscanf(line, "%" QUOTED_TOKENSIZE "s %" QUOTED_TOKENSIZE "s %f %f", linear_token, essential_token, &ParamDesc[i].Min, &ParamDesc[i].Max) == 4:Error:Assert failed:in file clusttool.cpp, line 73
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGILL (0x4) at pc=0x0000000110614ea4, pid=14945, tid=0x0000000000004203
#
# JRE version: Java(TM) SE Runtime Environment (8.0_202-b08) (build 1.8.0_202-b08)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.202-b08 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C  [libtesseract.4.dylib+0x1faea4]  _ZNK7ERRCODE5errorEPKc16TessErrorLogCodeS1_z+0x174
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/maxim/Documents/Development/Tess4Javacpp/hs_err_pid14945.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
Abort trap: 6

Prepending Maven/Gradle invocation with LC_ALL=C fixes the crash.

This time I switched to Leptonica 1.78.0, Javacpp 1.5 and Tesseract 4.0.0-1.5 including all required API changes.

Any ideas what's wrong?

@saudet
Copy link
Member

saudet commented Apr 22, 2019

Did you call setlocale(LC_ALL(), "C")?

@maximumspatium
Copy link
Author

Did you call setlocale(LC_ALL(), "C")?

Not explicitly. Where is it supposed to be called from? In the user code during initialization?

@saudet
Copy link
Member

saudet commented Apr 23, 2019 via email

@stweil
Copy link
Contributor

stweil commented Apr 22, 2022

This bug is fixed in Tesseract 4.1 and later releases, so calling setlocale is no longer necessary.

@maximumspatium
Copy link
Author

@stweil Tested with Tesseract 4.1.1 and Javacpp-presets 1.5.6.
JVM doesn't crash anymore when interacting with libtesseract.
I removed calls to setlocale from my code: Audiveris/audiveris@6fc18ec

Thanks for reporting the fix! Moreover, I'm glad to see this bug finally fixed after about 10 years of "Won't fix" status!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants