Edisum

This repository contains the PyTorch implementation for the models and experiments in the paper Edisum: Summarizing and Explaining Wikipedia Edits at Scale

@article{šakota2024edisum,
      title={Edisum: Summarizing and Explaining Wikipedia Edits at Scale}, 
      author={Marija Šakota and Isaac Johnson and Guosheng Feng and Robert West},
      journal={arXiv preprint arXiv:2404.03428}
      year={2024}
}

Please consider citing our work, if you found the provided resources useful.

1. Setup

Start by cloning the repository:

git clone https://github.com/epfl-dlab/edisum.git

We recommend creating a new conda virtual environment as follows:

conda env create -f environment.yml

This command also installs all the necessary packages.

2. Downloading data and models

The data is available on huggingface and can be loaded with:

from datasets import load_dataset
dataset = load_dataset("msakota/edisum_dataset")

Alternatively, to download the collected data for the experiments, run:

bash ./download_data.sh

For downloading the trained models (available on huggingface), run:

bash ./download_models.sh

3. Usage

Training

To train a model from scratch on the desired data, run:

DATA_DIR="./data/100_perc_synth_data/" # specify a directory where training data is located
RUN_NAME="train_longt5_100_synth"
python run_train.py run_name=$RUN_NAME dir=$DATA_DIR +experiment=finetune_longt5

Inference

To run inference on a trained model:

DATA_DIR="./data/100_perc_synth_data/" # specify a directory where training data is located
CHECKPOINT_PATH="./models/edisum_100.ckpt" # specify path to the trained model
RUN_NAME="inference_longt5_100_synth"
python run_inference.py run_name=$RUN_NAME dir=$DATA_DIR checkpoint_path=$CHECKPOINT_PATH +experiment=inference_longt5

License

This project is licensed under the terms of the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
configs		configs
scripts		scripts
src		src
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
download_data.sh		download_data.sh
download_models.sh		download_models.sh
environment.yml		environment.yml
gen_edit_summary.py		gen_edit_summary.py
run_finetune.sh		run_finetune.sh
run_inference.py		run_inference.py
run_inference.sh		run_inference.sh
run_train.py		run_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Edisum

1. Setup

2. Downloading data and models

3. Usage

Training

Inference

License

About

Releases

Packages

Languages

License

epfl-dlab/edisum

Folders and files

Latest commit

History

Repository files navigation

Edisum

1. Setup

2. Downloading data and models

3. Usage

Training

Inference

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages