Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow parsing from customized Timenorm #2

Open
etgld opened this issue Jan 30, 2024 · 5 comments
Open

Slow parsing from customized Timenorm #2

etgld opened this issue Jan 30, 2024 · 5 comments

Comments

@etgld
Copy link
Contributor

etgld commented Jan 30, 2024

Parsing from the customized version of Timenorm in Scala SBT is reasonably fast but there seems to be an odd slowdown for some mentions. There's a line that seems to indicate an issue with the -release flag in the Scala compiler only being compatible with Java 9. Since it still runs my best guess is it's not actually compiling to JVM bytecode and is being interpreted or something

@etgld
Copy link
Contributor Author

etgld commented Feb 1, 2024

The specific message from the mvn -U clean package log is:

[INFO] compiling 7 Scala sources to /home/etg/Repos/chemoTimelinesBaselineSystem/timelines/tweaked-timenorm/target/classes ...
[ERROR] -release is only supported on Java 9 and higher

Maybe it's nothing maybe it's something

@seanfinan
Copy link

You should be able to build and run ctakes 5.0.0-SNAPSHOT with java 11. Changing the java version(s) in your dockerfile might help.

@etgld
Copy link
Contributor Author

etgld commented Feb 1, 2024

Thanks for the heads up! Will try to get to that after we're done with the current round of error analysis

@etgld
Copy link
Contributor Author

etgld commented Feb 1, 2024

So the good news is that with the poms and dockerfile updated to reflect the Java 11 update, as well as using the latest version of Artemis (2.32.0) works, with only some slight differences in results, as well as getting rid of the error with Scala Java compiler bridge. The bad news is Timenorm is still slow on some instances, so the issue seems to lie with Timenorm.

I think this should be tractable to fix though because while I haven't done an exhaustive look at the time mentions that trip up Timenorm are either mentions with noise or mentions which have some ambiguity across date representation conventions (month/date/year vs date/month/year). While I have custom code outside of Timenorm that deals with some of this (since per instructions I am assuming month/date/year due to all the notes being from American institutions), it doesn't deal with all the types of such instances that show up. Will try and look closer at this after we finish this round of error analysis.

@etgld
Copy link
Contributor Author

etgld commented Mar 13, 2024

One thing I just thought of after talking to Jiarui. She's using Timenorm in the Anafora XML writer, and is now experiencing slowdowns there and was not before. It might be a versioning issue of either Timenorm or Scala, since I remember upgrading and/or downgrading versions of Scala and Timenorm and the compiler bridge depending on the requirements. While I made sure everything was compatible according to the documentation of each dependency, this could have something to do with it.

I might not be able to get to it today, but will try to confirm with her whether she did any version updates in the pom like I did a while ago, and that might give us a clue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants