khmer-crf-segmentation

Word segmentation using Conditional Random Fields (CRF) for Khmer document

See the detail article here:

https://medium.com/@phylypo/segmentation-of-khmer-text-using-conditional-random-fields-3a2d4d73956a

This project includes Python notebook that has the complete code to run the CRF. The notebook includes code to download/extract the data and trains the model.

CRF-Khmer-Segmentation.ipynb: Implementation using CRF
HMM_Khmer_Segmentaion.ipynb: Using Hidden Markov Model (HMM)
sklearn_Khmer_segmentation.ipynb: Naive Bayes and other sklearn algorithms (Random Forest and Linear Regression got to 93%, Naive Bayes is around 89%)

If you open this from Google Colab, you can run right away without any further setup.

See instruction here:

https://medium.com/@phylypo/open-python-notebook-from-github-9177ab819b53

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
models		models
CRF-Khmer-Segmentation.ipynb		CRF-Khmer-Segmentation.ipynb
CRF-Khmer-Segmnt_Report.ipynb		CRF-Khmer-Segmnt_Report.ipynb
HMM_Khmer_Segmentaion.ipynb		HMM_Khmer_Segmentaion.ipynb
README.md		README.md
sklearn_Khmer_segmentation.ipynb		sklearn_Khmer_segmentation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

khmer-crf-segmentation

About

Releases

Packages

Languages

phylypo/segmentation-crf-khmer

Folders and files

Latest commit

History

Repository files navigation

khmer-crf-segmentation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages