Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aborting OutOfMemoryError caused by too many results from ItemSearcher called from UFEDChatParser #2038

Closed
lfcnassif opened this issue Dec 30, 2023 · 4 comments · Fixed by #2046 or #2086
Assignees
Labels

Comments

@lfcnassif
Copy link
Member

lfcnassif commented Dec 30, 2023

An user reported OOME with a 80GB heap. Analyzing a smaller 32GB heap I asked for, there are many parsing threads using up to 1GB each:
image

Taking a look at them, most heap is being used by large ArrayList<Item> objects:
image

It took me a while to find from where those large Item lists come from. Looking those Threads stacktrace, those Lists are returned by ItemSearcher.search(query) calls from UFEDChatParser:

ParsingThread-20
  at java.util.Collections$SynchronizedMap.get(Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
  at java.util.Collections$UnmodifiableMap.get(Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
  at iped.engine.task.index.IndexItem.getItem(Lorg/apache/lucene/document/Document;Liped/engine/data/IPEDSource;Z)Liped/data/IItem; (IndexItem.java:940)
  at iped.engine.data.IPEDSource.getItemByLuceneID(I)Liped/data/IItem; (IPEDSource.java:493)
  at iped.engine.data.IPEDSource.getItemByID(I)Liped/data/IItem; (IPEDSource.java:503)
  at iped.engine.search.ItemSearcher$1$1.next()Liped/data/IItemReader; (ItemSearcher.java:62)
  at iped.engine.search.ItemSearcher$1$1.next()Ljava/lang/Object; (ItemSearcher.java:51)
  at iped.engine.search.ItemSearcher.search(Ljava/lang/String;)Ljava/util/List; (ItemSearcher.java:37)
  at iped.parsers.ufed.UFEDChatParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V (UFEDChatParser.java:112)
  at org.apache.tika.parser.CompositeParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V (CompositeParser.java:298)
  at iped.parsers.standard.StandardParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V (StandardParser.java:245)
  at iped.engine.io.ParsingReader$BackgroundParsing.run()V (ParsingReader.java:247)
  at java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object; (Unknown Source)
  at java.util.concurrent.FutureTask.run()V (Unknown Source)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (Unknown Source)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run()V (Unknown Source)
  at java.lang.Thread.run()V (Unknown Source)

Maybe there is some problem with the query, but it would be better to use the safer ItemSearcher.searchIterable(query) instead of ItemSearcher.search(query) where possible, which returns an Iterable instead of an ArrayList.

@lfcnassif
Copy link
Member Author

The user reported commit above fixed this OOME issue, so I'll merge it into master soon.

lfcnassif added a commit that referenced this issue Jan 9, 2024
fix #2038: use safer ItemSearcher searchIterable() instead of search()
@wladimirleite
Copy link
Member

@lfcnassif, while checking an issue related to the HTML generated by UFEDChatParser (reported by another user to @felipecampanini), I observed a different behavior comparing master and 4.1.5.
Analyzing the situation, I think I found a small bug in your fix that can generate an infinite loop in for (IItemReader subitem = subItems.next(); subItems.hasNext();), as subItems.next() is executed only once, right?

@wladimirleite
Copy link
Member

wladimirleite commented Feb 16, 2024

Clarifying: an infinite loop, if more than one item was found, and a lost message, if a single item was returned by the query, as hasNext() will return false after the next().

@lfcnassif
Copy link
Member Author

Thank you @wladimirleite and sorry for my fault! subitem = subItems.next(); should be also put after the last semicolon in the for clause. But your solution is much better!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants