Skip to content

ayaka14732/cantoseg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cantoseg

Cantonese segmentation tool 粵語分詞工具

Install

$ pip install cantoseg

Usage

>>> import cantoseg
>>> cantoseg.cut('香港喺舊石器時代就有人住')
['香港', '喺', '舊石器時代', '就', '有人', '住']

A generator version is also available: cantoseg.lcut.

Design

See article Cantonese Segmentation and Part-Of-Speech Tagging (in Chinese).