Toxicity in Thai Tweet Corpus

Annotated Corpus

Each row contains label, annotation ratio between toxic/nontoxic (using 3 annotators) and tweet id as the example:

1[tab][3/0][tab]tweet_id

Labels are following items:

1: Toxic
0: Non-Toxic

Toxic keywords

These keywords are the 44 keywords that we used to collect the tweets via Twitter Search API. Each row contains toxic keyword and its meaning as the example:Thai toxic word[tab]original meaning/toxic meaning.

Publication

In Proceedings of the Second Workshop on Text Analytics for Cybersecurity and Online Safety 2018 (to appear).

Demo application

http://cl.sd.tmu.ac.jp/thaitoxicity/

License

This project is licensed under the terms of the Creative Commons license.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
annotated_corpus.txt		annotated_corpus.txt
process_huggingface.ipynb		process_huggingface.ipynb
toxic_keywords.txt		toxic_keywords.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Toxicity in Thai Tweet Corpus

Annotated Corpus

Toxic keywords

Publication

Demo application

License

About

Releases

Packages

Contributors 2

Languages

tmu-nlp/ThaiToxicityTweetCorpus

Folders and files

Latest commit

History

Repository files navigation

Toxicity in Thai Tweet Corpus

Annotated Corpus

Toxic keywords

Publication

Demo application

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages