GitHub - YecanLee/2BeOETG: Official PyTorch Implementation of "Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework"

Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework
_{Official Implementation}

Paper | Project Page | Run Analysis Baseline

This repo contains the official implementation of our paper "Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework". You can find more details in our project page and our paper.

Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework
Esteban Garces Arias,Hannah Blocher, Julian Rodemann, Meimingwei Li, Christian Heumann,Matthias Aßenmacher
Department of Statistics, LMU Munich, Munich Center for Machine Learning (MCML)

In this paper, we present novel ranking strategies within this multicriteria framework. Specifically, we employ benchmarking approaches based on partial orderings and present a new summary metric designed to balance existing automatic indicators, providing a more holistic evaluation of text generation quality. Furthermore, we discuss the alignment of these approaches with human judgments. Our experiments demonstrate that the proposed methods offer a robust way to compare decoding strategies, exhibit similarities with human preferences, and serve as valuable tools in guiding model selection for open-ended text generation tasks. Finally, we suggest future directions for improving evaluation methodologies in text generation.

This repository contains:

🪐 A simple R implementation of Extended Bradley-Terry model
⚡️ Faster Metrics Calculation with Coherence, MAUVE and Diversity in Python
💥 A Colab notebook for running qstar analysis Demo in colab

Setup Environment 💻 [Back to Top]

To install all the dependencies for this repo, run the following command:

pip install -r requirements.txt
SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True pip install simctg

We recommend you to build a new conda environment to use the repository.

conda create -n helmet python=3.11
conda activate helmet
pip install -r requirements.txt
SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True pip install simctg

Data Download 📚 [Back to Top]

To download the data, please run the following command:

bash download_data.sh

Run Analysis Baseline 🔥 [Back to Top]

We open-sourced our pre-computed LLM inference results to reproduce our paper's analysis results. Please refer to the Data Download section to download the data. If you want to use your own inference results, please follow the following instructions.

Use Customized Inference Results.

Run coherence computation

To run the coherence computation for your own inference results json file, please run the following command:

python coherence_computation.py \
--opt_model_name $OPT_MODEL_NAME \
--test_path $TEST_PATH

Run diversity computation

To run the diversity computation for your own inference results json file, please run the following command:

python diversity_computation.py \
--test_path $TEST_PATH

Use pre-computed results

Run qstar analysis

To run the qstar analysis, please run the following command:

Rscript qstar_metric.R

You may need to modify the following lines in order to fit your data path:

# The following line locates at line 14, please change it to your results_with_pareto_efficiency.csv path
data_file <- "path/to/results_with_pareto_efficiency.csv"
# The following line locates at line 15, please change it to the path you want to save the 'ranking_qtext.csv'
candidate_stats_file <- "path/to/ranking_qtext.csv"
# The following line locates at line 16, please change it to the path you want to save the 'dominance_final_analysis.csv'
dominance_summary_file <- "path/to/dominance_final_analysis.csv"

Contributions 🚀 [Back to Top]

This repository is based on the following repositories:

We thank the authors for their open-sourced code.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md
coherence_compute.py		coherence_compute.py
diversity_compute.py		diversity_compute.py
dominance_final_analysis.xls		dominance_final_analysis.xls
download_data.sh		download_data.sh
qstar_metric.R		qstar_metric.R
ranking_alignment.xlsx		ranking_alignment.xlsx
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework
_{Official Implementation}

Paper | Project Page | Run Analysis Baseline

Table of Contents 📖 [Back to Top]

Setup Environment 💻 [Back to Top]

Data Download 📚 [Back to Top]

Run Analysis Baseline 🔥 [Back to Top]

Use Customized Inference Results.

Run coherence computation

Run diversity computation

Use pre-computed results

Run qstar analysis

Contributions 🚀 [Back to Top]

About

Releases

Packages

Languages

License

YecanLee/2BeOETG

Folders and files

Latest commit

History

Repository files navigation

Towards Better Open-Ended Text Generation: A Multicriteria Evaluation FrameworkOfficial Implementation

Paper | Project Page | Run Analysis Baseline

Table of Contents 📖 [Back to Top]

Setup Environment 💻 [Back to Top]

Data Download 📚 [Back to Top]

Run Analysis Baseline 🔥 [Back to Top]

Use Customized Inference Results.

Run coherence computation

Run diversity computation

Use pre-computed results

Run qstar analysis

Contributions 🚀 [Back to Top]

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework
_{Official Implementation}

Packages