Skip to content

v1.4.6

Latest
Compare
Choose a tag to compare
@pgaskin pgaskin released this 28 Jan 12:34
· 3 commits to master since this release
1d05618

Patrick Gaskin (@pgaskin)
1d05618 Improve word normalization for LookupWord[s]

In order (checked after each one):
* Trim leading/trailing spaces.
* Collapse all whitespace into a single space.
* Trim leading/trailing opening/closing unicode punctuation.
* Replace all unicode dash-like characters with a dash.
* Collapse multiple dashes into a single one.
* Try each of the following (without chaining):
  * Attempt to stem the word.
  * Try removing just -s*.
  * Try removing just -ly, then -ing.
  * Try again, but fold all unicode characters into their
    normalized form.