Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mt-bench #75

Merged
merged 51 commits into from
Mar 29, 2024
Merged

Add mt-bench #75

merged 51 commits into from
Mar 29, 2024

Conversation

NathanHB
Copy link
Member

@NathanHB NathanHB commented Feb 28, 2024

What this PR does:

  • Uses custom metrics and tasks to add llm a as judge
  • adds multi turn generation
  • Adds mt-bench metric

This implementation uses mt-bench prompts from InflectionAI. The code is inspired from the original implementation of mt-bench with notable differences.

  • mt-bench uses a custom-made chat templating system, we use the tokenizer
  • mt-bench uses an old version of the openai API, we use the newest one, with very simplified logic for chat prompt formating. We can easily add more models to act as judge.
  • We do not use varying temperature based on the sample we are evaluating. All samples are generated using do_sample=False and temperature set to 0.0.

@NathanHB NathanHB changed the title Add mt bench Add mt-bench Feb 28, 2024
@NathanHB NathanHB linked an issue Mar 4, 2024 that may be closed by this pull request
@clefourrier
Copy link
Member

@NathanHB feel free to ping me once it's merged with main so we can integrate it :)

@clefourrier
Copy link
Member

Careful with the deletion of task_examples, quite sure some of these files are needed by the nanotron team.

@@ -345,6 +353,91 @@ def greedy_until_with_logits(
override_bs=override_bs,
)

def greedy_until_multi_turn(self, requests: list[GreedyUntilMultiTurnRequest], override_bs: Optional[int] = None) -> GenerateMultiTurnReturn:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll also need to update the other model launchers, or at least add a function in the abstract model which indicates that it needs to be implemented

Copy link
Member

@clefourrier clefourrier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite cool overall!
Some code reorgs are need imo (management of multi turn context with all the other contexts, finding a better way to pass judgements for logging, implementing greedy_until_multi_turn in the other models, or at least the LightevalModel class), plus some doc.

Nice that the code is so neat, it looks like it will be easy to switch to "using a judge as metric" in the future!

extended_tasks/mt_bench/judge_prompts.jsonl Outdated Show resolved Hide resolved
extended_tasks/mt_bench/judges.py Outdated Show resolved Hide resolved
extended_tasks/mt_bench/judges.py Outdated Show resolved Hide resolved
extended_tasks/mt_bench/judges.py Outdated Show resolved Hide resolved
extended_tasks/mt_bench/main.py Outdated Show resolved Hide resolved
src/lighteval/tasks/lighteval_task.py Outdated Show resolved Hide resolved
extended_tasks/mt_bench/judges.py Outdated Show resolved Hide resolved
extended_tasks/mt_bench/judges.py Outdated Show resolved Hide resolved
extended_tasks/mt_bench/judges.py Outdated Show resolved Hide resolved
extended_tasks/mt_bench/main.py Outdated Show resolved Hide resolved
@clefourrier clefourrier mentioned this pull request Mar 26, 2024
Copy link
Member

@clefourrier clefourrier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor questions.
Overall, very good job in adding this mechanism! Looking forward having it as an actual metric!

src/lighteval/evaluator.py Show resolved Hide resolved
src/lighteval/models/base_model.py Outdated Show resolved Hide resolved
src/lighteval/models/base_model.py Show resolved Hide resolved
src/lighteval/tasks/extended/mt_bench/main.py Outdated Show resolved Hide resolved
src/lighteval/tasks/lighteval_task.py Outdated Show resolved Hide resolved
@NathanHB NathanHB merged commit af24080 into main Mar 29, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add MT-Bench
2 participants