Skip to content

Commit

Permalink
Add gemma-2-9b-it-SimPO and gemma-2-9b-it-DPO to AlpacaEval (#368)
Browse files Browse the repository at this point in the history
* Add gemma-2-9b-it-SimPO and gemma-2-9b-it-DPO to AlpacaEval

* Update configs.yaml

* Update configs.yaml

* update leaderboard location

* fix leaderboard

---------

Co-authored-by: Yann Dubois <yanndubois96@gmail.com>
  • Loading branch information
xiamengzhou and YannDubs authored Jul 18, 2024
1 parent a80fc97 commit 783c4b5
Show file tree
Hide file tree
Showing 13 changed files with 162,345 additions and 1 deletion.
4,832 changes: 4,832 additions & 0 deletions results/gemma-2-9b-it-DPO/model_outputs.json

Large diffs are not rendered by default.

4,832 changes: 4,832 additions & 0 deletions results/gemma-2-9b-it-DPO/reference_outputs.json

Large diffs are not rendered by default.

64,513 changes: 64,513 additions & 0 deletions results/gemma-2-9b-it-DPO/weighted_alpaca_eval_gpt4_turbo/annotations.json

Large diffs are not rendered by default.

Large diffs are not rendered by default.

4,832 changes: 4,832 additions & 0 deletions results/gemma-2-9b-it-SimPO/model_outputs.json

Large diffs are not rendered by default.

4,832 changes: 4,832 additions & 0 deletions results/gemma-2-9b-it-SimPO/reference_outputs.json

Large diffs are not rendered by default.

78,089 changes: 78,089 additions & 0 deletions results/gemma-2-9b-it-SimPO/weighted_alpaca_eval_gpt4_turbo/annotations.json

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
,win_rate,standard_error,n_wins,n_wins_base,n_draws,n_total,discrete_win_rate,mode,avg_length,length_controlled_winrate,lc_standard_error
gemma-2-9b-it-SimPO,65.86422561532919,1.423459922555078,540,264,1,805,67.14285714285714,community,1833,72.3508446939842,0.5167873784867067
openpipe-moa-gpt-4-turbo-v1,63.15493451236265,1.422980098799326,515,283,7,805,64.40993788819875,community,1856,68.37866250336802,0.7309418614587613
gemma-2-9b-it-DPO,65.35922380122982,1.402802336467638,536,268,1,805,66.64596273291924,community,2016,67.6620382198043,0.6605613085864308
Together-MoA,59.8688062333292,1.434305604543079,490,314,1,805,60.93167701863354,community,1825,65.37996976852163,0.7392392836781445
Storm-7B-best-of-64,63.04099075186919,1.4253258915161846,519,286,0,805,64.472049689441,community,2340,61.63789557199839,
Together-MoA-Lite,56.593045622273294,1.4464848562244548,456,347,2,805,56.77018633540373,community,1968,59.1415240989275,0.7580510219326322
Expand Down Expand Up @@ -75,7 +77,7 @@ gpt-3.5-turbo-16k-0613,14.13239070746584,1.027579400264853,96,704,5,805,12.23602
internlm2-chat-7b-ExPO,28.067817437082898,1.3159792318125112,209,595,1,805,26.02484472049689,community,2390,22.66748024879648,
gpt-3.5-turbo-0613,14.09579857390062,1.0371186215049395,99,700,6,805,12.670807453416147,community,1331,22.35251298054288,
gpt-3.5-turbo-1106_verbose,12.76316981026087,1.044246819212278,94,709,2,805,11.801242236024844,dev,1058,22.00093702171442,
Infinity-Instruct-3M-0625-Qwen2-7B,15.322182555525842,1.0986373100856872,118,685,2,805,14.782608695652174,community,1315,21.873996734999317,0.6990992627857084
Infinity-Instruct-3M-0625-Qwen2-7B,15.322182555525842,1.0986373100856872,118,685,2,805,14.782608695652174,community,1315,21.87399673499932,0.6990992627857084
gpt4_0613_concise,9.400320574596272,0.901021275896262,71,729,5,805,9.130434782608695,dev,627,21.57799091454269,
pairrm-tulu-2-70b,18.638962967441,1.1924966700012911,140,665,0,805,17.391304347826086,community,1607,21.428403975507223,
tulu-2-dpo-70b,15.982854374136648,1.1457861368237434,119,683,3,805,14.96894409937888,verified,1418,21.238610038371124,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -170,4 +170,6 @@ Nanbeige2-16B-Chat,-1.4383673979411902,0.8415127360873783,-0.3850159994606512
openpipe-moa-gpt-4-turbo-v1,-1.0482540803063984,0.8922946327161730,1.1183646496339554
SPPO-Llama-3-Instruct-8B-PairRM,-1.0191251760902622,0.8783306469909790,-0.4989987701412274
SPPO-Gemma-2-9B-It-PairRM,-1.1139152907711427,0.5972758612054220,0.2481716650556719
gemma-2-9b-it-DPO,-1.0421098127771280,0.7544689135365252,0.9950063245939248
gemma-2-9b-it-SimPO,-1.1421073244366444,0.6125150070394807,1.1709131554933978
higgs-llama-3-70b-v2,-1.3408055191105048,0.9224458425462844,0.4939211483441316
16 changes: 16 additions & 0 deletions src/alpaca_eval/models_configs/gemma-2-9b-it-DPO/configs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
gemma-2-9b-it-DPO:
completions_kwargs:
batch_size: 900
do_sample: true
max_new_tokens: 4096
model_kwargs:
torch_dtype: bfloat16
model_name: princeton-nlp/gemma-2-9b-it-DPO
stop_token_ids:
- 1
- 107
temperature: 0.5
top_p: 1.0
fn_completions: vllm_local_completions
pretty_name: gemma-2-9b-it-DPO
prompt_template: gemma-2-9b-it-DPO/prompt.txt
3 changes: 3 additions & 0 deletions src/alpaca_eval/models_configs/gemma-2-9b-it-DPO/prompt.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
<bos><start_of_turn>user
{instruction}<end_of_turn>
<start_of_turn>model
16 changes: 16 additions & 0 deletions src/alpaca_eval/models_configs/gemma-2-9b-it-SimPO/configs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
gemma-2-9b-it-SimPO:
completions_kwargs:
batch_size: 900
do_sample: true
max_new_tokens: 4096
model_kwargs:
torch_dtype: bfloat16
model_name: princeton-nlp/gemma-2-9b-it-SimPO
stop_token_ids:
- 1
- 107
temperature: 0.5
top_p: 1.0
fn_completions: vllm_local_completions
pretty_name: gemma-2-9b-it-SimPO
prompt_template: gemma-2-9b-it-DPO/prompt.txt

0 comments on commit 783c4b5

Please sign in to comment.