Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Storm-7B, Storm-7B (best-of-64) to AlpacaEval #344

Merged
merged 4 commits into from
Jun 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4,832 changes: 4,832 additions & 0 deletions results/Storm-7B-best-of-64/model_outputs.json

Large diffs are not rendered by default.

Large diffs are not rendered by default.

4,832 changes: 0 additions & 4,832 deletions results/Storm-7B-num-beams-10/model_outputs.json

This file was deleted.

9,660 changes: 4,830 additions & 4,830 deletions results/Storm-7B/model_outputs.json

Large diffs are not rendered by default.

40,385 changes: 20,565 additions & 19,820 deletions results/Storm-7B/weighted_alpaca_eval_gpt4_turbo/annotations.json

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
,win_rate,standard_error,n_wins,n_wins_base,n_draws,n_total,discrete_win_rate,mode,avg_length,length_controlled_winrate,lc_standard_error
openpipe-moa-gpt-4-turbo-v1,63.15493451236265,1.422980098799326,515,283,7,805,64.40993788819875,community,1856,68.37866250336802,0.7309418614587613
Together-MoA,59.8688062333292,1.434305604543079,490,314,1,805,60.93167701863354,community,1825,65.37996976852163,0.7392392836781445
Storm-7B-best-of-64,63.04099075186919,1.4253258915161846,519,286,0,805,64.472049689441,community,2340,61.63789557199839,
Together-MoA-Lite,56.593045622273294,1.4464848562244548,456,347,2,805,56.77018633540373,community,1968,59.1415240989275,0.7580510219326322
aligner-2b_gpt-4-turbo-2024-04-09,46.77089325668323,1.3378060774476594,371,417,17,805,40.18633540372671,community,1370,58.33130206276722,
gpt-4o-2024-05-13,51.32757578249279,1.4700094589795554,429,369,7,805,53.72670807453416,minimal,1873,57.45682883335095,
gpt-4-turbo-2024-04-09,46.11526538763708,1.474073957743638,370,426,9,805,46.52173913043478,minimal,1802,55.01530093647852,
claude-3-5-sonnet-20240620,40.560214096828275,1.4679655403720542,312,493,0,805,38.75776397515528,community,1488,52.36675427146999,
yi-large-preview,57.46724251946292,1.4305696667082746,463,338,4,805,57.7639751552795,verified,2335,51.894415134099546,
Storm-7B-num-beams-10,55.39223031175099,1.4432354650537405,451,354,0,805,56.024844720496894,community,2582,51.76986749912786,
gpt4_1106_preview_verbose,64.30360147101865,1.3348590089025316,525,268,12,805,65.96273291925466,dev,2402,51.57500797967598,
Storm-7B,50.26886905528583,1.4728176780737183,397,408,0,805,49.31677018633541,community,2045,50.45110959343775,
gpt4_1106_preview,50.0,0.0,0,0,805,805,50.0,minimal,2049,50.0,
Storm-7B,52.47113499955521,1.4665272219232597,431,374,0,805,53.54037267080746,community,2788,48.90648220146071,
Llama-3-Instruct-8B-SimPO-ExPO,40.63285400856655,1.4439449942168028,325,479,1,805,40.43478260869565,community,1765,45.78021783946177,
Llama-3-Instruct-8B-SimPO,40.52977498461182,1.422574464675002,319,485,1,805,39.68944099378882,community,1825,44.65131348921881,0.8800655791760451
Nanbeige-Plus-Chat-v0.1,56.70300973017392,1.482841874951873,456,347,2,805,56.77018633540373,community,2587,44.45966240337981,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,6 @@ Qwen1.5-14B-Chat,-1.2673261254168109,0.8917765927062211,-1.4423447102225453
Qwen1.5-7B-Chat,-1.4334024171563693,0.9571477402285772,-2.3083932198426003
Qwen1.5-1.8B-Chat,-1.6003884852505712,0.9646855557741588,-4.6744303356917447
Qwen1.5-110B-Chat,-1.4481674391207744,0.9102999775192784,-0.2004892206655888
Storm-7B,-0.3778112670657819,0.5727965213879709,0.0000000000000000
SPPO-Mistral7B-PairRM,-1.0066475422582106,0.9046614612887018,-0.9905877944582094
SPPO-Mistral7B-PairRM-ExPO,-0.9297137384620632,0.7671267711136246,-0.8709792439039323
internlm2-chat-7b-ExPO,-1.1989304003616963,0.6968622384940820,-1.4260629123293445
Expand All @@ -165,5 +164,7 @@ Llama-3-Instruct-8B-SimPO-ExPO,-1.1153280231371028,0.8741611299275304,-0.1029222
merlinite-7B-AOT,-0.9472382718509442,0.8407838130728476,-0.8954727783980261
Together-MoA,-1.0555583531357304,0.8453234405641900,0.9217351025640278
Together-MoA-Lite,-1.0572386816426196,0.7849833974539681,0.5628671529713698
Storm-7B,-0.2454158607006287,0.6674628551824360,0.0978593995297498
Storm-7B-best-of-64,-0.7151816152506517,0.6962107247259065,0.6517965200881723
Nanbeige2-16B-Chat,-1.4383673979411902,0.8415127360873783,-0.3850159994606512
openpipe-moa-gpt-4-turbo-v1,-1.0482540803063984,0.8922946327161730,1.1183646496339554
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
Storm-7B-num-beams-10:
Storm-7B-best-of-64:
prompt_template: "Storm-7B/prompt.txt"
fn_completions: null
completions_kwargs:
model_name: "jieliu/Storm-7B" # local path
model_name: "jieliu/Storm-7B"
model_kwargs:
torch_dtype: 'bfloat16'
max_new_tokens: 2048
temperature: 1.0
num_beams: 10
do_sample: True
n: 64
batch_size: 8
pretty_name: "Storm-7B (num_beams=10)"
pretty_name: "Storm-7B (best-of-64)"
link: "https://huggingface.co/jieliu/Storm-7B"
9 changes: 4 additions & 5 deletions src/alpaca_eval/models_configs/Storm-7B/configs.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
Storm-7B:
prompt_template: "Storm-7B/prompt.txt"
fn_completions: huggingface_local_completions
fn_completions: vllm_local_completions
completions_kwargs:
model_name: "jieliu/Storm-7B" # local path
model_name: "jieliu/Storm-7B"
model_kwargs:
torch_dtype: 'bfloat16'
max_new_tokens: 2048
temperature: 1.0
do_sample: True
batch_size: 8
temperature: 0
batch_size: 32
pretty_name: "Storm-7B"
link: "https://huggingface.co/jieliu/Storm-7B"
Loading