S3D: A Weakly Supervised Sarcasm Dataset

This is the repository for our 'Utilizing Weak Supervision to Create S3D: A Sarcasm Annotated Dataset' paper submitted to the EMNLP NLP+CSS 2022 workshop. This repository includes our SAD dataset along with version 1 and 2 of our S3D dataset. Both of these twitter datasets can be used for the purpose of training sarcasm detection models.

Datasets

SAD - We provide the Tweet IDs and the given sarcasm labels of 2340 manually annotated tweets which were collected observing the #sarcasm hashtag. Available on HuggingFace

S3D-v1 - We provide the Tweet IDs of 100,000 tweets along with their respective labels which were predicted by a fine-tuned BERTweet model which was trained on our 'Combined dataset', a corpus of over a million tweets and reddit comments labelled for sarcasm in previous works. Available on HuggingFace

S3D-v2 - We provide the Tweet IDs of 100,000 tweets along with their respective labels which were predicted by an ensemble of our 'best' three fine-tuned sarcasm detection models. Available on HuggingFace

Experiments

We provide a notebook to show the labelling process of our datasets. You can reproduce the experiments to create S3D-v1 and S3D-v2 via our Python notebooks which uses HuggingFace to load the relevant models to label the dataset.

Models

Models	Fine-tuned Models	Description
BERTweet	BERTweet-base-finetuned-SARC-combined-DS	BERTweet model fine-tuned on our combined dataset
BERTweet	BERTweet-base-finetuned-SARC-DS	BERTweet model fine-tuned on the SARC dataset
RoBERTa_large	roberta-large-finetuned-SARC-combined-DS	RoBERTa_large model fine-tuned on our combined dataset

Maintainer(s)

Jordan Painter
Diptesh Kanojia

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

S3D: A Weakly Supervised Sarcasm Dataset

Datasets

Experiments

Models

Maintainer(s)

Files

README.md

Latest commit

History

README.md

File metadata and controls

S3D: A Weakly Supervised Sarcasm Dataset

Datasets

Experiments

Models

Maintainer(s)