Name		Name	Last commit message	Last commit date
parent directory ..
tmp		tmp
AUTHORS.md		AUTHORS.md
README.md		README.md
requirements.txt		requirements.txt
text_features.py		text_features.py
ubc_train.py		ubc_train.py
ubs_train.py		ubs_train.py
uts2017_bank_tc_opt.py		uts2017_bank_tc_opt.py
uts2017_bank_tc_predict.py		uts2017_bank_tc_predict.py
uts2017_bank_tc_train.py		uts2017_bank_tc_train.py
vlsp2016_sa_train.py		vlsp2016_sa_train.py
vntc_description.md		vntc_description.md
vntc_opt.py		vntc_opt.py
vntc_predict.py		vntc_predict.py
vntc_train.py		vntc_train.py

README.md

Training a Vietnamese Text Classifier

In this play, we build a Vietnamese Text Classifier using VNTC dataset

Results

Corpus Description

Experiment results with SVM model and Tfidf, BoW features

Models	F1 (%)
TfidfVectorizer(ngram_range=(1, 2), max_df=0.5)	92.8
CountVectorizer(ngram_range=(1, 3), max_df=0.7)	89.3
TfidfVectorizer(max_df=0.8)	89.0
CountVectorizer(ngram_range=(1, 3)	88.9
TfidfVectorizer(ngram_range=(1, 3))	86.8
CountVectorizer(max_df=0.7)	85.5

Reproduce

Create Development Environment

cd plays/vntc
conda create -n classification python=3.6
pip install -r requirements.txt

Download VNTC dataset

underthesea download-data VNTC

Train a text classifier model

python vntc_train.py 
>>> Start training
Dev score: 0.933037037037037
Test score: 0.9267266194191333
>>> Finish training in 116.78 seconds
Your model is saved in tmp/classification_svm_vntc

Predict using trained model

python vntc_predict.py

Text: Huawei có thể không cần Google, nhưng sẽ ra sao nếu thiếu ARM ?
Labels: ['vi_tinh']

Text: Trưởng phòng GD&ĐT xin lỗi vụ học sinh nhận khen thưởng là tờ giấy A4
Labels: ['chinh_tri_xa_hoi']

Optimize hyper-parameters

$ python optimize_hyperparameters.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

classification

classification

README.md

Training a Vietnamese Text Classifier

Results

Reproduce

Files

classification

Directory actions

More options

Directory actions

More options

Latest commit

History

classification

Folders and files

parent directory

README.md

Training a Vietnamese Text Classifier

Results

Reproduce