Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition prevents library initialization ~1/100 times. #10

Open
samsieber opened this issue Nov 19, 2021 · 6 comments
Open

Race condition prevents library initialization ~1/100 times. #10

samsieber opened this issue Nov 19, 2021 · 6 comments

Comments

@samsieber
Copy link
Contributor

We've started using this at $WORK, and we've seen that sometimes initializing the library fails.

new SymSpellBuilder().setUnigramLexicon(unigrams)
                            .setBigramLexicon(bigrams)
                            .setMaxDictionaryEditDistance(2)
                            .createSymSpell();

Throws this exception:

        java.lang.ArrayIndexOutOfBoundsException
                at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
                at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
                at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
                at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
                at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598)
                at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677)
                at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735)
                at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
                at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
                at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
                at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
                at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:650)
                at io.gitlab.rxp90.jsymspell.SymSpellImpl.<init>(SymSpellImpl.java:43)
                at io.gitlab.rxp90.jsymspell.SymSpellBuilder.createSymSpell(SymSpellBuilder.java:64)
                ... 
        Caused by: java.lang.ArrayIndexOutOfBoundsException
                at java.lang.System.arraycopy(Native Method)
                at java.util.ArrayList.addAll(ArrayList.java:586)
                at io.gitlab.rxp90.jsymspell.SymSpellImpl.lambda$null$1(SymSpellImpl.java:45)
                at java.util.HashMap.forEach(HashMap.java:1289)
                at io.gitlab.rxp90.jsymspell.SymSpellImpl.lambda$new$2(SymSpellImpl.java:45)
                at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
                at java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1556)
                at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
                at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
                at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
                at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
                at java.util.concurrent.ForkJoinPool$WorkQueue.execLocalTasks(ForkJoinPool.java:1040)
                at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1058)
                at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
                at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)

We've wrapped the initialization code in a loop as a stop gap, but it appears the problem is here:

this.unigramLexicon.keySet().parallelStream().forEach((word) -> {
            Map<String, Collection<String>> edits = this.generateEdits(word);
            edits.forEach((string, suggestions) -> {
                ((Collection)this.deletes.computeIfAbsent(string, (ignored) -> {
                    return new ArrayList();
                })).addAll(suggestions);
            });
        });

The this.deletes collection is a ConcurrentHashMap, but the new ArrayList() it returns is not thread-safe. I believe switching the computeIfAbsent to a compute call that also adds the values to the ArrayList will fix it, but I'm not sure.

@indra-rosadi-rally
Copy link
Contributor

I also experienced this issue. Although for me, the bug manifested in a little bit different way. The initialization did not fail, but sometimes the SymSpell.deletes contains null value. So during execution, you may end up getting NPE. I think it is due to expanding the array list with add all. I created #11 to resolve this issue.

I think the load is fast enough so parallelStream is not really needed. Another option is to use synchronized list, but then read operations may take a hit when running the code.

It will be great if @rxp90 can review the PR above.

Thanks.

@indra-rosadi-rally
Copy link
Contributor

@rxp90 - Thanks for merging the #11 PR. Any chance we can get a new version built soon?

@indra-rosadi-rally
Copy link
Contributor

@rxp90 - I saw a new tag is created. For some reason, I still don't see the new version in maven. Any idea when it will get there?

@rxp90
Copy link
Owner

rxp90 commented Aug 22, 2022

I think it's fixed now, it should be available soon

@Nanway
Copy link

Nanway commented Jul 14, 2024

hi @rxp90 just wondering if v1.1 is on maven yet:
image

@rxp90
Copy link
Owner

rxp90 commented Jul 15, 2024

@Nanway It looks like I'll need to update the publishing steps to adapt to the changes made in maven central

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants