Skip to content
/ HKIME Public

"Hong Kong Input Method Editor" Experimentation WIP

Notifications You must be signed in to change notification settings

Jyutt/HKIME

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HKIME

HKIME is an on-going project that aims to create an Input Method Editor for Cantonese. Progress is documented in the Jupyter Notebooks located under the directory notebooks.

Our latest milestone is a simple bi-gram based statistical language model with no smoothing from a corpus made from scraping Cantonese Wikipedia. We are currently working on training a NN-based language model.

Relevant papers and sources that have tremendously aided our progress are in resources.md and sources.md.

Core Components of the Project

All of these are in progress at the moment

  • Fuzzy Jyutping / Processing different romanizations
  • Jyutping Segmentation
  • Jyut2Char: Jyutping to Character Conversion
  • Scraping / Corpus Generation
  • Neural Net Language Model

About

"Hong Kong Input Method Editor" Experimentation WIP

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •