GitHub - DAMO-NLP-SG/Auto-Arena-LLMs

This is the repo for paper "Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions".

Project Website | Paper | Leaderboard

How to use the repository

Prepare the environment:

Set up the environment using: conda env create -f env.yml
Activate the environment with: conda activate LLM_Eval
Make sure you have the environment variables listed in APIs

Before including any participants, make sure:

The participant's calling function is written in the "generate_response" function inside "utils/api_utils.py".
The participant's MMLU score is included in "data/MMLU.csv". If the participant doesn't have an MMLU score, fill in "-1" and it will be initially paired with the median candidate.

To run a tournament, first change the "player_names" variable in the python files to adjust the tournament members. Example commands:

English: python run_tournament.py --tournament_dir data/main_tour_40
Chinese: python run_tournament.py --tournament_dir data/main_tour_40_zh --language zh

To add a new participant to a finished tournament, here is an example command:

python run_tournament_add_participant.py --tournament_dir data/main_tour_40 --add_participant SenseChat_5

To run debates between a pair of selected models, here is an example command:

python run_LD_pair.py --model_a Qwen/Qwen1.5-72B-Chat --model_b claude-3-haiku-20240307

We release 2 questions per category as demos for each debate on the website. The full results are not pushed to the website. An analysis including all figures in the papers was run in notebook result_analysis.ipynb.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
LICENSE copy		LICENSE copy
env.yml		env.yml
readme.md		readme.md
requirements.txt		requirements.txt
result_analysis.ipynb		result_analysis.ipynb
run_LD_pair.py		run_LD_pair.py
run_tournament.py		run_tournament.py
run_tournament_add_participant.py		run_tournament_add_participant.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How to use the repository

About

Releases

Packages

Languages

License

DAMO-NLP-SG/Auto-Arena-LLMs

Folders and files

Latest commit

History

Repository files navigation

How to use the repository

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages