-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing expected results in fuzzy search (no stemming) #375
Comments
Sorry for taking a while to get to this... Looks like a bug to me, I put together a simplified reproduction on jsfiddle. It looks like, for some reason, that trailing characters only match if they are the same as the last character in the fuzzy string, weird! This also explains why the test is passing. I'll dig into this a bit and come up with a fix, thanks for reporting. |
Looking at
Would I be incorrect in thinking that |
Fixes GH olivernn#375 Before, insertions were not made at the end of a fuzzy string for token sets
I've created a PR at #382 that I believe fixes this issue. |
Fixes GH #375 Before, insertions were not made at the end of a fuzzy string for token sets
I've just pushed 2.3.5 which includes the fix from @hoelzro . |
Performing fuzzy search seems to miss some words within the given edit distance.
Here is one example (disabling stemming and all other pipeline functions to ensure that we are only observing the behavior of fuzzy search):
In the example above, I would expect the words
scienza
andcoscienzaxx
to also match, as they are at edit distance of 2 from the query termcoscienza
(two deletions or insertions at the word boundary).This is also visible if one observes the fuzzy
TokenSet
expansion for the termcoscienza
:I am not sure if this is a bug or the intended behavior of fuzzy search. In the latter case, maybe it would deserve a mention in the documentation.
Thanks again for the great work!
The text was updated successfully, but these errors were encountered: