Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I can't get compound word errors to be split properly #56

Open
wnm3 opened this issue Oct 27, 2020 · 2 comments
Open

I can't get compound word errors to be split properly #56

wnm3 opened this issue Oct 27, 2020 · 2 comments

Comments

@wnm3
Copy link

wnm3 commented Oct 27, 2020

Describe the bug
I have a sentence like:

String resp = "Yes because it's: $~num~ for ~misc~ payment contract $~num~ for planTOTAL $~num~";

but I can't configure the parameters correctly to get "plan total". I have tried the default settings and various combinations of parameters like:

         SpellCheckSettings spellCheckSettings = SpellCheckSettings.builder()
//            .countThreshold(1).deletionWeight(1).insertionWeight(1)
//            .replaceWeight(1).maxEditDistance(2).transpositionWeight(1).topK(5)
//            .prefixLength(10).verbosity(Verbosity.ALL)
            .build();

         DataHolder dataHolder = new InMemoryDataHolder(spellCheckSettings,
            new Murmur3HashFunction());

         StringDistance weightedDamerauLevenshteinDistance = new WeightedDamerauLevenshteinDistance(
            spellCheckSettings.getDeletionWeight(),
            spellCheckSettings.getInsertionWeight(),
            spellCheckSettings.getReplaceWeight(),
            spellCheckSettings.getTranspositionWeight(), new QwertyDistance());

         SymSpellCheck checker = new SymSpellCheck(dataHolder,
            weightedDamerauLevenshteinDistance, spellCheckSettings);

         List<SuggestionItem> suggestions = checker.lookupCompound(resp, 1.0d, true );

To Reproduce
Steps to reproduce the behavior:

  1. Use code above and examine the suggestions.get(0).getTerm() to see that there is no split.

Expected behavior
The term should show "plan total" (or better, preserving case... (is there an option for this?) "plan TOTAL") and not "plantotal"

Desktop (please complete the following information):

  • OS: MacOS 10.15.7
  • openjdk version "1.8.0_252"

Additional context
Providing more details on the impact of settings would be helpful for people not familiar with the art.

This may be related to #53

@wnm3
Copy link
Author

wnm3 commented Oct 28, 2020

I realized after creating this issue that I wasn't loading any dictionaries. It would help if the README.md showed examples for loading the two dictionaries you have so people can see the spell checker working. Also, I was calling lookup rather than lookupCompound. After looking at the test cases, I figured out what I was missing.

One thing that was also confusing to me is how Verbosity.ALL is used in the SymSpellC?heck class. While the lookup method takes Verbosity as a parameter, the other methods ignore what was passed and provide a hardcoded Verbosity.TOP value. I'd tried referencing what was passed but that broke a bunch of test cases so I backed out that approach.

I'll close this hoping you might update the README to include references to loading the dictionaries to help others wanting to try your libraries.

@wnm3 wnm3 closed this as completed Oct 28, 2020
@MighTguY MighTguY reopened this Jan 18, 2021
@MighTguY
Copy link
Owner

HI @wnm3 , for bringing out this one, I am reopening the same. and will close once I update the ReadMe. :)
Thanks again 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants