Word segmentation using Conditional Random Fields (CRF) for Khmer document
See the detail article here:
https://medium.com/@phylypo/segmentation-of-khmer-text-using-conditional-random-fields-3a2d4d73956a
This project includes Python notebook that has the complete code to run the CRF. The notebook includes code to download/extract the data and trains the model.
- CRF-Khmer-Segmentation.ipynb: Implementation using CRF
- HMM_Khmer_Segmentaion.ipynb: Using Hidden Markov Model (HMM)
- sklearn_Khmer_segmentation.ipynb: Naive Bayes and other sklearn algorithms (Random Forest and Linear Regression got to 93%, Naive Bayes is around 89%)
If you open this from Google Colab, you can run right away without any further setup.
See instruction here:
https://medium.com/@phylypo/open-python-notebook-from-github-9177ab819b53