Skip to content

tmu-nlp/ThaiToxicityTweetCorpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Toxicity in Thai Tweet Corpus

License: CC BY-NC 4.0

Annotated Corpus

Each row contains label, annotation ratio between toxic/nontoxic (using 3 annotators) and tweet id as the example:

1[tab][3/0][tab]tweet_id

Labels are following items:

  • 1: Toxic
  • 0: Non-Toxic

Toxic keywords

These keywords are the 44 keywords that we used to collect the tweets via Twitter Search API. Each row contains toxic keyword and its meaning as the example:Thai toxic word[tab]original meaning/toxic meaning.

Publication

In Proceedings of the Second Workshop on Text Analytics for Cybersecurity and Online Safety 2018 (to appear).

Demo application

http://cl.sd.tmu.ac.jp/thaitoxicity/

License

This project is licensed under the terms of the Creative Commons license.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published