Skip to content

Commit

Permalink
Add Storm-7B, Storm-7B (best-of-64) to AlpacaEval (#344)
Browse files Browse the repository at this point in the history
* Add Storm-7B, Storm-7B (best-of-64) to AlpacaEval

* Add Storm-7B, Storm-7B (best-of-64) to AlpacaEval

* Add Storm-7B, Storm-7B (best-of-64) to AlpacaEval

---------

Co-authored-by: Yann Dubois <yanndubois96@gmail.com>
  • Loading branch information
yifan123 and YannDubs authored Jun 21, 2024
1 parent 7aa2e65 commit 58acae6
Show file tree
Hide file tree
Showing 9 changed files with 49,724 additions and 48,909 deletions.
4,832 changes: 4,832 additions & 0 deletions results/Storm-7B-best-of-64/model_outputs.json

Large diffs are not rendered by default.

Large diffs are not rendered by default.

4,832 changes: 0 additions & 4,832 deletions results/Storm-7B-num-beams-10/model_outputs.json

This file was deleted.

9,660 changes: 4,830 additions & 4,830 deletions results/Storm-7B/model_outputs.json

Large diffs are not rendered by default.

40,385 changes: 20,565 additions & 19,820 deletions results/Storm-7B/weighted_alpaca_eval_gpt4_turbo/annotations.json

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
,win_rate,standard_error,n_wins,n_wins_base,n_draws,n_total,discrete_win_rate,mode,avg_length,length_controlled_winrate,lc_standard_error
openpipe-moa-gpt-4-turbo-v1,63.15493451236265,1.422980098799326,515,283,7,805,64.40993788819875,community,1856,68.37866250336802,0.7309418614587613
Together-MoA,59.8688062333292,1.434305604543079,490,314,1,805,60.93167701863354,community,1825,65.37996976852163,0.7392392836781445
Storm-7B-best-of-64,63.04099075186919,1.4253258915161846,519,286,0,805,64.472049689441,community,2340,61.63789557199839,
Together-MoA-Lite,56.593045622273294,1.4464848562244548,456,347,2,805,56.77018633540373,community,1968,59.1415240989275,0.7580510219326322
aligner-2b_gpt-4-turbo-2024-04-09,46.77089325668323,1.3378060774476594,371,417,17,805,40.18633540372671,community,1370,58.33130206276722,
gpt-4o-2024-05-13,51.32757578249279,1.4700094589795554,429,369,7,805,53.72670807453416,minimal,1873,57.45682883335095,
gpt-4-turbo-2024-04-09,46.11526538763708,1.474073957743638,370,426,9,805,46.52173913043478,minimal,1802,55.01530093647852,
claude-3-5-sonnet-20240620,40.560214096828275,1.4679655403720542,312,493,0,805,38.75776397515528,community,1488,52.36675427146999,
yi-large-preview,57.46724251946292,1.4305696667082746,463,338,4,805,57.7639751552795,verified,2335,51.894415134099546,
Storm-7B-num-beams-10,55.39223031175099,1.4432354650537405,451,354,0,805,56.024844720496894,community,2582,51.76986749912786,
gpt4_1106_preview_verbose,64.30360147101865,1.3348590089025316,525,268,12,805,65.96273291925466,dev,2402,51.57500797967598,
Storm-7B,50.26886905528583,1.4728176780737183,397,408,0,805,49.31677018633541,community,2045,50.45110959343775,
gpt4_1106_preview,50.0,0.0,0,0,805,805,50.0,minimal,2049,50.0,
Storm-7B,52.47113499955521,1.4665272219232597,431,374,0,805,53.54037267080746,community,2788,48.90648220146071,
Llama-3-Instruct-8B-SimPO-ExPO,40.63285400856655,1.4439449942168028,325,479,1,805,40.43478260869565,community,1765,45.78021783946177,
Llama-3-Instruct-8B-SimPO,40.52977498461182,1.422574464675002,319,485,1,805,39.68944099378882,community,1825,44.65131348921881,0.8800655791760451
Nanbeige-Plus-Chat-v0.1,56.70300973017392,1.482841874951873,456,347,2,805,56.77018633540373,community,2587,44.45966240337981,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,6 @@ Qwen1.5-14B-Chat,-1.2673261254168109,0.8917765927062211,-1.4423447102225453
Qwen1.5-7B-Chat,-1.4334024171563693,0.9571477402285772,-2.3083932198426003
Qwen1.5-1.8B-Chat,-1.6003884852505712,0.9646855557741588,-4.6744303356917447
Qwen1.5-110B-Chat,-1.4481674391207744,0.9102999775192784,-0.2004892206655888
Storm-7B,-0.3778112670657819,0.5727965213879709,0.0000000000000000
SPPO-Mistral7B-PairRM,-1.0066475422582106,0.9046614612887018,-0.9905877944582094
SPPO-Mistral7B-PairRM-ExPO,-0.9297137384620632,0.7671267711136246,-0.8709792439039323
internlm2-chat-7b-ExPO,-1.1989304003616963,0.6968622384940820,-1.4260629123293445
Expand All @@ -165,5 +164,7 @@ Llama-3-Instruct-8B-SimPO-ExPO,-1.1153280231371028,0.8741611299275304,-0.1029222
merlinite-7B-AOT,-0.9472382718509442,0.8407838130728476,-0.8954727783980261
Together-MoA,-1.0555583531357304,0.8453234405641900,0.9217351025640278
Together-MoA-Lite,-1.0572386816426196,0.7849833974539681,0.5628671529713698
Storm-7B,-0.2454158607006287,0.6674628551824360,0.0978593995297498
Storm-7B-best-of-64,-0.7151816152506517,0.6962107247259065,0.6517965200881723
Nanbeige2-16B-Chat,-1.4383673979411902,0.8415127360873783,-0.3850159994606512
openpipe-moa-gpt-4-turbo-v1,-1.0482540803063984,0.8922946327161730,1.1183646496339554
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
Storm-7B-num-beams-10:
Storm-7B-best-of-64:
prompt_template: "Storm-7B/prompt.txt"
fn_completions: null
completions_kwargs:
model_name: "jieliu/Storm-7B" # local path
model_name: "jieliu/Storm-7B"
model_kwargs:
torch_dtype: 'bfloat16'
max_new_tokens: 2048
temperature: 1.0
num_beams: 10
do_sample: True
n: 64
batch_size: 8
pretty_name: "Storm-7B (num_beams=10)"
pretty_name: "Storm-7B (best-of-64)"
link: "https://huggingface.co/jieliu/Storm-7B"
9 changes: 4 additions & 5 deletions src/alpaca_eval/models_configs/Storm-7B/configs.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
Storm-7B:
prompt_template: "Storm-7B/prompt.txt"
fn_completions: huggingface_local_completions
fn_completions: vllm_local_completions
completions_kwargs:
model_name: "jieliu/Storm-7B" # local path
model_name: "jieliu/Storm-7B"
model_kwargs:
torch_dtype: 'bfloat16'
max_new_tokens: 2048
temperature: 1.0
do_sample: True
batch_size: 8
temperature: 0
batch_size: 32
pretty_name: "Storm-7B"
link: "https://huggingface.co/jieliu/Storm-7B"

0 comments on commit 58acae6

Please sign in to comment.