Add gemma-2-9b-it-SimPO and gemma-2-9b-it-DPO to AlpacaEval #368

xiamengzhou · 2024-07-17T15:46:59Z

Here are the results for two gemma-based models. Could you help merge the results to the leaderboard? Many thanks!

                            length_controlled_winrate  win_rate  standard_error  n_total  avg_length
gemma-2-9b-it-SimPO                             72.35     65.86            1.42      805        1833
gemma-2-9b-it-DPO                               67.66     65.36            1.40      805        2016
gpt-4o-2024-05-13                               57.46     51.33            1.47      805        1873
gpt-4-turbo-2024-04-09                          55.02     46.12            1.47      805        1802
gpt4_1106_preview                               50.00     50.00            0.00      805        2049
claude-3-opus-20240229                          40.51     29.11            1.39      805        1388
claude-3-sonnet-20240229                        34.87     25.56            1.34      805        1420
Meta-Llama-3-70B-Instruct                       34.42     33.18            1.39      805        1919
gemini-pro                                      24.38     18.18            1.16      805        1456
Mixtral-8x7B-Instruct-v0.1                      23.69     18.26            1.19      805        1465
Meta-Llama-3-8B-Instruct                        22.92     22.57            1.26      805        1899
Mistral-7B-Instruct-v0.2                        17.11     14.72            1.08      805        1676
alpaca-7b                                        5.88      2.59            0.49      805         396

YannDubs · 2024-07-17T22:29:58Z

Very impressive results @xiamengzhou but the leaderboard is surprisingly at the wrong path (not sure why).
You should be pushing:
src/alpaca_eval/leaderboards/data_AlpacaEval_2/weighted_alpaca_eval_gpt4_turbo_leaderboard.csv
and you should not have
src/alpaca_eval/leaderboards/weighted_alpaca_eval_gpt4_turbo_leaderboard.csv

Do you know why this is the case? Please update the PR

xiamengzhou · 2024-07-17T23:03:50Z

@YannDubs Thanks! It should be fixed now. I accidentally ran the final verification experiments on a cloned original repository instead of the forked repository. To correct this, I manually copied all the files over to the forked repository, which is likely why the issue occurred. Let me know if there are still any problems!

YannDubs · 2024-07-17T23:09:59Z

thanks @xiamengzhou but it seems that your files are removing some models on the leaderboard (maybe due to merge conflict). Please add back those rows

xiamengzhou · 2024-07-17T23:39:59Z

@YannDubs Fixed!

xiamengzhou and others added 3 commits July 17, 2024 11:43

Add gemma-2-9b-it-SimPO and gemma-2-9b-it-DPO to AlpacaEval

a181c98

Update configs.yaml

b8334cd

Update configs.yaml

d2b21ce

update leaderboard location

12b02ad

Merge branch 'main' into main

b5b8e18

fix leaderboard

2ee79f7

YannDubs merged commit 783c4b5 into tatsu-lab:main Jul 18, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gemma-2-9b-it-SimPO and gemma-2-9b-it-DPO to AlpacaEval #368

Add gemma-2-9b-it-SimPO and gemma-2-9b-it-DPO to AlpacaEval #368

xiamengzhou commented Jul 17, 2024

YannDubs commented Jul 17, 2024

xiamengzhou commented Jul 17, 2024

YannDubs commented Jul 17, 2024

xiamengzhou commented Jul 17, 2024

Add gemma-2-9b-it-SimPO and gemma-2-9b-it-DPO to AlpacaEval #368

Add gemma-2-9b-it-SimPO and gemma-2-9b-it-DPO to AlpacaEval #368

Conversation

xiamengzhou commented Jul 17, 2024

YannDubs commented Jul 17, 2024

xiamengzhou commented Jul 17, 2024

YannDubs commented Jul 17, 2024

xiamengzhou commented Jul 17, 2024