Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WhatsApp parsing timeout can break parsing of other WA databases #1679

Closed
lfcnassif opened this issue May 11, 2023 · 1 comment
Closed

WhatsApp parsing timeout can break parsing of other WA databases #1679

lfcnassif opened this issue May 11, 2023 · 1 comment
Assignees
Labels

Comments

@lfcnassif
Copy link
Member

lfcnassif commented May 11, 2023

While testing #1651, processing hundreds of different WA databases together, one of them caused a timeout. Then parsing of a different WA DB fails with trace below:

org.apache.tika.exception.TikaException: WAExtractorException Exception
	at iped.parsers.whatsapp.WhatsAppParser.mergeParsedDBsAndOutputResults(WhatsAppParser.java:728)
	at iped.parsers.whatsapp.WhatsAppParser.parse(WhatsAppParser.java:257)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
	at iped.parsers.standard.StandardParser.parse(StandardParser.java:245)
	at iped.engine.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: java.nio.channels.ClosedByInterruptException
	at iped.parsers.whatsapp.Message.getThumbData(Message.java:221)
	at iped.parsers.whatsapp.ReportGenerator.printMessage(ReportGenerator.java:430)
	at iped.parsers.whatsapp.ReportGenerator.lambda$generateNextChatHtml$0(ReportGenerator.java:147)
	at iped.parsers.whatsapp.ReportGenerator$1.lookup(ReportGenerator.java:633)
	at org.apache.commons.text.StringSubstitutor.resolveVariable(StringSubstitutor.java:1148)
	at org.apache.commons.text.StringSubstitutor.substitute(StringSubstitutor.java:1514)
	at org.apache.commons.text.StringSubstitutor.substitute(StringSubstitutor.java:1389)
	at org.apache.commons.text.StringSubstitutor.replace(StringSubstitutor.java:893)
	at iped.parsers.whatsapp.ReportGenerator.printMessageFile(ReportGenerator.java:644)
	at iped.parsers.whatsapp.ReportGenerator.generateNextChatHtml(ReportGenerator.java:125)
	at iped.parsers.whatsapp.WhatsAppParser.createReport(WhatsAppParser.java:281)
	at iped.parsers.whatsapp.WhatsAppParser.mergeParsedDBsAndOutputResults(WhatsAppParser.java:719)
	... 9 more
Caused by: java.nio.channels.ClosedByInterruptException
	at java.base/java.nio.channels.spi.AbstractInterruptibleChannel.end(Unknown Source)
	at java.base/sun.nio.ch.FileChannelImpl.endBlocking(Unknown Source)
	at java.base/sun.nio.ch.FileChannelImpl.readInternal(Unknown Source)
	at java.base/sun.nio.ch.FileChannelImpl.read(Unknown Source)
	at iped.parsers.whatsapp.Message.getThumbData(Message.java:218)
	... 20 more

In sequence many of this are thrown:

org.apache.tika.exception.TikaException: WAExtractorException Exception
	at iped.parsers.whatsapp.WhatsAppParser.mergeParsedDBsAndOutputResults(WhatsAppParser.java:728)
	at iped.parsers.whatsapp.WhatsAppParser.parse(WhatsAppParser.java:257)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
	at iped.parsers.standard.StandardParser.parse(StandardParser.java:245)
	at iped.engine.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: java.nio.channels.ClosedChannelException
	at iped.parsers.whatsapp.Message.getThumbData(Message.java:221)
	at iped.parsers.whatsapp.ReportGenerator.printMessage(ReportGenerator.java:430)
	at iped.parsers.whatsapp.ReportGenerator.lambda$generateNextChatHtml$0(ReportGenerator.java:147)
	at iped.parsers.whatsapp.ReportGenerator$1.lookup(ReportGenerator.java:633)
	at org.apache.commons.text.StringSubstitutor.resolveVariable(StringSubstitutor.java:1148)
	at org.apache.commons.text.StringSubstitutor.substitute(StringSubstitutor.java:1514)
	at org.apache.commons.text.StringSubstitutor.substitute(StringSubstitutor.java:1389)
	at org.apache.commons.text.StringSubstitutor.replace(StringSubstitutor.java:893)
	at iped.parsers.whatsapp.ReportGenerator.printMessageFile(ReportGenerator.java:644)
	at iped.parsers.whatsapp.ReportGenerator.generateNextChatHtml(ReportGenerator.java:125)
	at iped.parsers.whatsapp.WhatsAppParser.createReport(WhatsAppParser.java:281)
	at iped.parsers.whatsapp.WhatsAppParser.mergeParsedDBsAndOutputResults(WhatsAppParser.java:719)
	... 9 more
Caused by: java.nio.channels.ClosedChannelException
	at java.base/sun.nio.ch.FileChannelImpl.ensureOpen(Unknown Source)
	at java.base/sun.nio.ch.FileChannelImpl.read(Unknown Source)
	at iped.parsers.whatsapp.Message.getThumbData(Message.java:218)
	... 20 more

When a timeout happens, the parsing thread is interrupted. That can close the thumb cache file channel if it is being read or written. We should reopen the channel if exceptions above are thrown.

@lfcnassif lfcnassif self-assigned this May 11, 2023
@lfcnassif lfcnassif added the bug label May 11, 2023
@lfcnassif
Copy link
Member Author

lfcnassif commented May 11, 2023

Another safer approach would be using a different thumb cache file per WA database being parsed. For now I'll keep the cache file static and reopen it if it is closed.

@lfcnassif lfcnassif changed the title WhatsApp parsing timeout can close the thumb cache file and break parsing of other WA databases WhatsApp parsing timeout can break parsing of other WA databases May 11, 2023
lfcnassif added a commit to aberenguel/IPED that referenced this issue Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant