This repon contains code to produce a CSV where each row is an observation from the Stanford Sentiment Treebank dataset, along with the predictions from an LSTM, and the predictions from a RoBERTa model fine-tuned on this dataset using the transformers
library.
I presented the results from this in my ODSC West 2020 talk on swap-ins / swap-outs, since this code allows you to not only produce the accuracy numbers from each dataset, but also to see which individual observations were predicted correctly by one model but incorrectly by another model.
This assumes you have conda
installed.
conda update python
conda create -n bert-error-analysis python=3.8.1
pip install -r requirements.txt
- Change
BASE_DIR
inconst.py
to be the folder of this repo on your computer.
The Stanford Sentiment Treebank dataset contains, among other things, a labeled dataset of ~11K sentences, each with an associated sentiment score between 0 and 1 and a "split" determining whether the sentence is in the training set or the evaluation set.
cd
ing into main
and running python fine_tune_pretrained_three_class.py
will:
- Use the
transformers
library to load in theroberta-base
tokenizer and the data itself into a PyTorch Dataset. Using the settings in this Python file, the raw labels in the dataset - which are continuous sentiment scores from 0 to 1 - will be mapped to three labels:
0
(negative): sentiment score less than 0.41
(neutral): sentiment score greater than 0.4 (inclusive) and less than 0.6 (exclusive)2
(positive): sentiment score greater than 0.6
-
Initialize a "
RobertaFineTuningModel
", which contains theroberta-base
model fromtransformers
and a fully connected layer on top of the final representation for the[CLS]
token (which itself comes from thepooler
of theRobertaModel
- for more on this, see here). -
Train this for three epochs on the sentiment analysis data, saving the resulting models at each epoch. By default, the script runs this using the smaller of the two batch sizes mentioned in the RoBERTa paper,
16
, and the middle learning rate of the three mentioned,2e-5
. By default, it fine-tunes for five epochs (following the paper) and saves the model results each epoch. Warning: each of these models will take up between 400 and 450 MB once saved! -
Then, once trained, you can run
python evaluation.py
, pointing to the correct model folder in that script, to generate a CSV containing the predictions for each observation.
To pre-train a toy version of BERT with just under 5M parameters for 50 epochs on the Wiki-2 dataset, cd
into main
, and run:
python pretrain_custom.py
This will save lists of the losses for the masked language modeling and next sentence prediction tasks. You can plot these losses over time by running:
python plot.py
Helpful guide for pre-training BERT (uses MXNet)