Skip to content

PyTorch deep learning model to detect toxic of Vietnamese sentences using Bert

License

Notifications You must be signed in to change notification settings

hoangcaobao/Vietnamese_Text_Toxic_Classify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VietnameseTextToxicClassify

I train this model using PyTorch to detect toxic of a comment for Projectube

I use VNCoreNLP for preprocess the raw Vietnamese sentences data and PhoBERT to train model to classify text. I use these technology at https://github.com/VinAIResearch/PhoBERT.

Use my code

1. Git clone my repository:

git clone https://github.com/hoangcaobao/Vietnamese_Text_Toxic_Classify.git

2. Change directory to my folder and install VNCoreNLP:

cd VietnameseTextToxicClassify
pip3 install vncorenlp
mkdir -p vncorenlp/models/wordsegmenter
wget https://github.com/raw/vncorenlp/VnCoreNLP/master/VnCoreNLP-1.1.1.jar
wget https://github.com/raw/vncorenlp/VnCoreNLP/master/models/wordsegmenter/vi-vocab
wget https://github.com/raw/vncorenlp/VnCoreNLP/master/models/wordsegmenter/wordsegmenter.rdr
mv VnCoreNLP-1.1.1.jar vncorenlp/ 
mv vi-vocab vncorenlp/models/wordsegmenter/
mv wordsegmenter.rdr vncorenlp/models/wordsegmenter/

3. Add more data in 2 json files

4. Run training file:

python3 training.py

HOANG CAO BAO

About

PyTorch deep learning model to detect toxic of Vietnamese sentences using Bert

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages