Skip to content

Latest commit

 

History

History
76 lines (57 loc) · 2.81 KB

text_classification.md

File metadata and controls

76 lines (57 loc) · 2.81 KB

Text Classification

Text classification algorithms are at the heart of a variety of software systems that process text data at scale. Email software uses text classification to determine whether incoming mail is sent to the inbox or filtered into the spam folder. Discussion forums use text classification to determine whether comments should be flagged as inappropriate.

These are two examples of topic classification, categorizing a text document into one of a predefined set of topics. In many topic classification problems, this categorization is based primarily on keywords in the text.

VNTC

A Large-scale Vietnamese News Text Classification Corpus

Level 1: 10 topics, 33,759 documents for training and 50,373 documents for testing

Model Score Paper / Source Code
NGRAM 97.1 Vu et al. RIVF'07
SVM Multi 93.4 Vu et al. RIVF'07

Level 2: 27 topics, 14375 documents for training and 12076 documents for testing

Model Score Paper / Source Code
SVM Multi 96.21 Vu et al. RIVF'07

Social Media Text

Miscellaneous

📜 Papers

💫 Services

📁 Open sources