Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

分词出现重复问题 #107

Open
li995495592 opened this issue Jan 21, 2020 · 1 comment
Open

分词出现重复问题 #107

li995495592 opened this issue Jan 21, 2020 · 1 comment

Comments

@li995495592
Copy link

有一些词有两种分词方式,结果中将两种分词方式放在前后返回了,例如"注意事项",分词结果编程了“注意、事项、注意事项”,明明只有一个注意事项,结果出现了两个注意事项,又如“校门口”,分词结果变成“校门、门口、校门口”,三个字在结果中变成了七个字,这样不行吧

@zhaochuanzhen
Copy link

zhaochuanzhen commented Feb 24, 2020

Path path = Paths.get(new 
        File(getClass().getClassLoader().getResource("dicts/intent.dict").getPath()).getAbsolutePath());
WordDictionary.getInstance().loadUserDict(path);
JiebaSegmenter segmenter = new JiebaSegmenter();
List<SegToken> result = segmenter.process(text, JiebaSegmenter.SegMode.SEARCH)

JiebaSegmenter.SegMode.SEARCH : 这个参数设置上即可
你默认的应该是:
JiebaSegmenter.SegMode.INDEX

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants