This repository contains the code used to produce the models and data in the blog post Language Models vs. The SAT Reading Test.
Dataset: emozilla/sat-reading Models: XXL (11B) XL (3B) Large (780M) Base (350M)
File | Description |
---|---|
combine-raw-data.py | Combine data in raw-data folder into a single JSON |
create-dataset.py | Create datasets-compatible datasets from combined JSON |
process-dataset-for-training.py | Create a tokenized version of an existing dataset for training |
prompt-loop.py | Playground for loading and prompting models |
take-tests.py | Evaluate models against a dataset |
train.py | Finetune a FLAN-T5 model |
To check the generalization of finetuned models, install lm-evaluation-harness and run it on the SuperGLUE
metrics: cb
, copa
, superglue_rte
, wic
, and wsc
(and any other metrics you'd like, of course).