This repo contains a Python script, emotion_classification.py
, which performs emotion classification on a text dataset using a finetuned transformer model in the HuggingFace
pipeline.
The Fake News Dataset consists of 10556 news articles, each containing a title, article text and label. All articles are either real or fake news. For the emotion classification task, only the headlines were used.
The model used for the emotion classification task is the j-hartmann/emotion-english-distilroberta-base
transformer model from the HuggingFace platform (Jochen Hartmann, "Emotion English DistilRoBERTa-base". HuggingFace link, 2022). The model is a finetuned version of the distilroberta-base
model. It predicts Ekman's 6 basic emotions plus a neutral class: anger
, disgust
, fear
, joy
, neutral
, sadness
and surprise
.
The emotion_classification.py
script follows these steps:
- Import dependencies
- Initialize model pipeline
- Load data
- Perform emotion classification on headlines
- Plot distribution of emotion in all headlines
- Plot distribution of emotions in real versus fake headlines
- Save plots to
plots
folder
The code is tested on Python 3.11.2. Futhermore, if your OS is not UNIX-based, a bash-compatible terminal is required for running shell scripts (such as Git for Windows).
The repo was setup to work with Windows (the WIN_ files), MacOS and Linux (the MACL_ files).
git clone https://github.com/alekswael/using-finetuned-transformers
cd using-finetuned-transformers
NOTE: Depending on your OS, run either WIN_setup.sh
or MACL_setup.sh
.
The setup script does the following:
- Creates a virtual environment for the project
- Activates the virtual environment
- Installs the correct versions of the packages required
- Deactivates the virtual environment
bash WIN_setup.sh
NOTE: Depending on your OS, run either WIN_run.sh
or MACL_run.sh
.
Run the *run.sh
script. The script does the following:
- Activates the virtual environment
- Runs
emotion_classification.py
located in thesrc
folder - Deactivates the virtual environment
bash WIN_run.sh
This repository has the following structure:
│ .gitignore
│ MACL_run.sh
│ MACL_setup.sh
│ README.md
│ requirements.txt
│ WIN_run.sh
│ WIN_setup.sh
│
├───data
│ fake_or_real_news.csv
│
├───plots
│ all_data_plot.png
│ separated_plot.png
│
└───src
emotion_classification.py
When gauging at the emotion distribution plot for all headlines, it seems a large proportion were classified as neutral, which I assume is a positive thing when considering news objectivity. Fear and anger are most prevalent after neutral, perhaps due to negativity bias in headlines. Surprisingly, surprise is a somewhat rare classification, which is counter intuitive when considering the nature of news being "new" and therefore surprising in some sense.
Figure: Distribution of emotions in all news headlines.
When looking at real versus fake headlines, it seems the classifications are quite similar across all emotions. Fear, neutral and sadness are more prevalent in real news then in fake.
Figure: Distribution of emotions real versus fake headlines.