Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gemma-2-9b-it-SimPO and gemma-2-9b-it-DPO to AlpacaEval #368

Merged
merged 6 commits into from
Jul 18, 2024

Conversation

xiamengzhou
Copy link
Contributor

Here are the results for two gemma-based models. Could you help merge the results to the leaderboard? Many thanks!

                            length_controlled_winrate  win_rate  standard_error  n_total  avg_length
gemma-2-9b-it-SimPO                             72.35     65.86            1.42      805        1833
gemma-2-9b-it-DPO                               67.66     65.36            1.40      805        2016
gpt-4o-2024-05-13                               57.46     51.33            1.47      805        1873
gpt-4-turbo-2024-04-09                          55.02     46.12            1.47      805        1802
gpt4_1106_preview                               50.00     50.00            0.00      805        2049
claude-3-opus-20240229                          40.51     29.11            1.39      805        1388
claude-3-sonnet-20240229                        34.87     25.56            1.34      805        1420
Meta-Llama-3-70B-Instruct                       34.42     33.18            1.39      805        1919
gemini-pro                                      24.38     18.18            1.16      805        1456
Mixtral-8x7B-Instruct-v0.1                      23.69     18.26            1.19      805        1465
Meta-Llama-3-8B-Instruct                        22.92     22.57            1.26      805        1899
Mistral-7B-Instruct-v0.2                        17.11     14.72            1.08      805        1676
alpaca-7b                                        5.88      2.59            0.49      805         396

@YannDubs
Copy link
Collaborator

Very impressive results @xiamengzhou but the leaderboard is surprisingly at the wrong path (not sure why).
You should be pushing:
src/alpaca_eval/leaderboards/data_AlpacaEval_2/weighted_alpaca_eval_gpt4_turbo_leaderboard.csv
and you should not have
src/alpaca_eval/leaderboards/weighted_alpaca_eval_gpt4_turbo_leaderboard.csv

Do you know why this is the case? Please update the PR

@xiamengzhou
Copy link
Contributor Author

@YannDubs Thanks! It should be fixed now. I accidentally ran the final verification experiments on a cloned original repository instead of the forked repository. To correct this, I manually copied all the files over to the forked repository, which is likely why the issue occurred. Let me know if there are still any problems!

@YannDubs
Copy link
Collaborator

thanks @xiamengzhou but it seems that your files are removing some models on the leaderboard (maybe due to merge conflict). Please add back those rows

@xiamengzhou
Copy link
Contributor Author

@YannDubs Fixed!

@YannDubs YannDubs merged commit 783c4b5 into tatsu-lab:main Jul 18, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants