Binaries are available at Maven Central
Please follow this link for project documentation.
Dictionary locations:
/usr/share/hunspell
/usr/local/share/hunspell
On Mac OS X, which also relies on hunspell for spell checking purposes, additional dictionary locations can be examined:
/System/Library/Spelling
/Library/Spelling
~/Spelling
/opt/local/share/hunspell
(in case MacPorts are installed)/sw/share/hunspell
(in case Fink is installed)
If such information is provided by dictionaries, hunspell can also perform morphological analysis – see hunspell(4) man page, section "Optional data fields".
2 projects are available – one is based on JNA and the other one on BridJ.
Despite Mac OS X inherently relies on hunspell for spell checking tasks and supprts 3rd party hunspell dictionaries, its Objective C API doesn't support stemming nor morphological analysis (see NSSpellChecker
class reference).
"Mac OS X for Java Geeks" (Chapter 11, "The Mac OS X Spelling Framework"), refers to com.apple.spell.ui
Java package, but the book has been published in 2003, and covers Mac OS X 10.2 and JDK 1.4. The package mentioned is missing from Mac OS X 10.9 distribution. The Apple-shipped Java packages are instead:
apple.applescript
apple.awt
apple.keychain
(JDK 1.4 only)apple.laf
apple.launcher
apple.security
apple.util
com.apple.concurrent
com.apple.crypto
com.apple.dnssd
com.apple.eawt
com.apple.eio
com.apple.java
com.apple.jobjc
(particularly, containscom.apple.jobjc.appkit.NSSpellChecker
andcom.apple.jobjc.foundation.NSSpellServer
classes)com.apple.laf
com.apple.mrj
com.apple.resources
seman by aot.ru
$ for w in 'друг' 'друзья' 'люди' 'какая'; do echo $w; done | iconv -t CP1251 | ./TestLem russian | iconv -f CP1251
Loading..
Input a word..
+ ДРУГ С од мр,им,ед 147889 ДРУ'Г
+ ДРУГ С од мр,им,мн 147889 ДРУЗЬЯ'
+ ЧЕЛОВЕК С од мр,им,мн 135031 ЛЮ'ДИ
+ КАКАТЬ ДЕЕПРИЧАСТИЕ нп,нс дст,нст 151931 КА'КАЯ + КАКОЙ МС-П но,од,жр,им,ед 148987 КАКА'Я
$ echo 'Варкалось, хливкие шорьки пырялись по наве' | iconv -t CP1251 | ./TestSynan russian | iconv -f CP1251
ok
sentences count: 1
sentences count: 1
<chunk>
<input>Варкалось, хливкие шорьки пырялись по наве</input>
<sent>
<synvar>
<clause type="ГЛ_ЛИЧН">Варкалось , хливкие шорьки пырялись по наве</clause>
<group type="ПРИЛ_СУЩ">хливкие шорьки</group>
<group type="ОДНОР_ИГ">Варкалось , хливкие шорьки</group>
<group type="ПГ">по наве</group>
</synvar>
<rel name="ПРИЛ_СУЩ" gramrel="вн,им,мн," lemmprnt="ШОРЕК" grmprnt="но,мр,вн,им,мн," lemmchld="ХЛИВКИЙ" grmchld="но,од,вн,им,мн," > шорьки -> хливкие </rel>
<rel name="ПГ" gramrel="пр," lemmprnt="ПО" grmprnt="" lemmchld="НАВ" grmchld="но,мр,пр,ед," > по -> наве </rel>
<rel name="ОДНОР_ИГ" gramrel="вн,им,мн," lemmprnt="," grmprnt="" lemmchld="ВАРКАЛОСЬ" grmchld="но,ср,жр,мр,пр,тв,вн,дт,рд,им,ед,мн," > , -> Варкалось </rel>
<rel name="ОДНОР_ИГ" gramrel="вн,им,мн," lemmprnt="," grmprnt="" lemmchld="ШОРЕК" grmchld="но,мр,вн,им,мн," > , -> шорьки </rel>
<rel name="ПОДЛ" gramrel="" lemmprnt="ПЫРЯТЬСЯ" grmprnt="дст,нп,нс,прш,мн," lemmchld="ВАРКАЛОСЬ" grmchld="но,ср,жр,мр,пр,тв,вн,дт,рд,им,ед,мн," > пырялись -> Варкалось </rel>
</sent>
</chunk>
mystem by Yandex
Version 2.1 for Mac OS X is linked incorrectly against /usr/local/Cellar/gcc47/4.7.2/gcc/lib/libstdc++.6.dylib
:
$ otool -L mystem
mystem:
/usr/local/Cellar/gcc47/4.7.2/gcc/lib/libstdc++.6.dylib (compatibility version 7.0.0, current version 7.17.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 169.3.0)
and dumps a core when run. The problem can be fixed with install_name_tool
:
$ install_name_tool -change /usr/local/Cellar/gcc47/4.7.2/gcc/lib/libstdc++.6.dylib /usr/lib/libstdc++.6.dylib mystem
$ otool -L mystem
mystem:
/usr/lib/libstdc++.6.dylib (compatibility version 7.0.0, current version 7.17.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 169.3.0)
$ echo -e 'какая\nдрузья\nлюди\nваркалось\nхливкие\nшорьки\nглокая\nкуздра' | ./mystem -n -e utf-8 -i -l
какать=V,несов,нп=непрош,деепр|какой=APRO=им,ед,жен
друг=S,муж,од=им,мн
человек=S,муж,од=им,мн
варкаться?=V,несов,нп=прош,ед,изъяв,сред
хливкий?=A=им,мн,полн|?=A=вин,мн,полн,неод
шорька?=S,жен,неод=им,мн|?=S,жен,неод=род,ед|?=S,жен,неод=вин,мн
глокать?=V,несов,нп=непрош,деепр|глокий?=A=им,ед,полн,жен
куздра?=S,ед,жен,неод=им|куздра?=S,гео,жен,неод=им,ед
Apache Licene contains a port of C++ hunspell API to Java, see the API documentation.
Colons can be used to align columns.
Product | Russian | Ukrainian | English | German | Morphological Analysis | Syntax Analysis |
---|---|---|---|---|---|---|
hunspell | yes | yes | yes | yes | yes (if supported by dictionaries) | no |
seman | yes | no | yes | yes | yes | yes |
mystem | yes | no | no | no | yes | no |
LanguageTool | yes | yes | yes | yes | yes | no |
Lucene | ? | ? | ? | ? | ? | no |
Product | C++ | Java |
---|---|---|
hunspell | yes | yes |
seman | yes | no |
mystem | yes | no |
LanguageTool | no | yes |
Lucene | no | yes |
Product | Windows | Linux | Mac OS X |
---|---|---|---|
hunspell | yes | yes | yes |
seman | yes | yes | no |
mystem | yes | yes | yes |
LanguageTool | yes | yes | yes |
Lucene | yes | yes | yes |
Product | License | Can be distributed with Caché? |
---|---|---|
hunspell | GPL/LGPL/MPL | yes |
seman | LGPL | yes |
mystem | non-commercial | no |
LanguageTool | LGPL | yes |
Lucene | Apache License | yes |
For C++ implementation, it is possible to link against either hunspell, seman or mystem and return the results of morphological analytis as a JSON object using Boost Property Tree
ZWARRAYP type can be used to pass strings from/to Caché.