Skip to content

Latest commit

 

History

History
52 lines (42 loc) · 2.07 KB

README_en.md

File metadata and controls

52 lines (42 loc) · 2.07 KB

gcws

Build Status Coverage Status Go Report Card GoDoc

gcws is CWS(Chinese Word Segmentation) for golang - many cws adapters manager.

The repo is inspired by database/sql.

中文说明

Install

go get github.com/WindomZ/gcws/...

Supported

  • sego - Go中文分词,用双数组trie(Double-Array Trie)实现[GitHub]
  • jieba - "结巴"中文分词的Golang版本[GitHub]
  • cwsharp - Golang中文分词库,支持多种分词模式,支持自定义字典和扩展[GitHub]
  • segment - golang 版中文分词包, inspired from 盘古分词[GitHub]
  • gse - Go efficient text segmentation; support english, chinese, japanese and other.[GitHub]

Usage

Import it

import (
    "github.com/WindomZ/gcws"
)

Init it (example with jieba)

import (
    _ "github.com/WindomZ/gcws/jieba"
)
...
cws, err := gcws.NewCWS("jieba")

Use it

cws.Tokenize("For man is man and master of his fate.") // return []string{...}

Mode

  • ModeDefault - default mode
  • ModeSearch - search optimization, support sego, jieba, segment, gse
  • ModeFast - run fast, support cwsharp
  • ModeEnglish - optimization for English, supportsego, jieba