rebuttal_tables

	VG_Relation	VG_Attribution	COCO_Order	Flickr30k_Order	Avg.
Chance Level	50	50	20	20	-
VLM Baseline
CLIP	51	55	50	62	54.5
BLIP	57	83	37	44	55.3
X-VLM	65	89	35	56	61.3
LVLM
BLIP2	45	96	21	45	51.8
InstructBLIP	73	83	31	44	57.8
LA-V2	60	70	47	55	58.0
LLaVA	42	47	60	31	45.0
MiniGPT-4	63	63	42	27	48.8
mPLUG-Owl	37	64	11	9	30.3
Otter	5	45	12	16	19.5
VPGTrans	47	69	33	20	42.3

	BLIP2	InstructBLIP	LA-V2	LLaVA	MiniGPT-4	mPLUG-Owl	Otter
baseline (prompt in the paper)	24.05	24.4	19.17	18.17	16.37	20.37	14.47
mean (80 prompts)	26.52	24.30	18.34	17.47	13.8	18.16	13.78
std (80 prompts)	3.28	2.47	1.35	1.38	2.31	2.74	0.73

	BLIP2	InstructBLIP	LA-V2	LLaVA	MiniGPT-4	mPLUG-Owl	Otter
Mean (10 rounds)	23.87	24.40	21.87	18.64	16.86	14.47	15.92
Std (10 rounds)	0.01	0.01	0.20	0.30	0.46	0.02	0.42

	BLIP2	InstructBLIP	LA-V2	LLaVA	MiniGPT-4	mPLUG-Owl	Otter
Single-turn	72	80	72	58	56	48	60
Multi-turn	74	80	76	74	66	80	68

Model Name	BLIP2	InstructBLIP	LA-V2	LLaVA	MiniGPT-4	mPLUG-Owl	Otter	VPGTrans
Average Score	0.771	0.879	0.846	0.794	0.500	0.345	0.514	0.451
Previous Average Score	0.758	0.900	0.835	0.768	0.481	0.324	0.523	0.462

Paper (634)		Web (1770)		Latest (2750)
Model Name	Elo rating	Model Name	Elo rating	Model Name	Elo rating
mPLUG-Owl	1027.0	LA-V2	1023	LA-V2	1038.74
MiniGPT-4	1021.3	LLaVA	1019.9	MiniGPT-4	1009.87
Otter	1013.2	VPGTrans	1012.1	LLaVA	1002.76
LA-V2	1010.2	MiniGPT-4	1011.9	VPGTrans	999.63
LLaVA	1009.7	InstructBLIP	999.5	InstructBLIP	995.45
InstructBLIP	1003.7	mPLUG-Owl	996.3	mPLUG-Owl	985.60
VPGTrans	974.3	Otter	981.5	Otter	972.11
BLIP2	949.4	BLIP2	955.8	BLIP2	948.8

Task Type	BLIP2	InstructBLIP	LA-V2	LLaVA	MiniGPT-4	mPLUG-Owl	Otter	VPGTrans
ImgCls	0.893	0.976	0.881	0.618	0.779	0.814	0.538	0.475
OC	0.805	0.834	0.716	0.510	0.521	0.391	0.972	0.480
MCI	0.925	1.000	0.803	0.720	0.846	0.326	0.628	0.598
OCR	0.958	1.000	0.355	0.319	0.268	0.261	0.193	0.605
KIE	0.681	0.738	0.611	0.502	0.333	0.150	0.498	0.627
Captioning	0.960	0.980	0.582	0.038	0.086	0.003	0.189	0.603

Paper (634)		Web (1770)		Latest (2750)
Model Name	Elo rating	Model Name	Elo rating	Model Name	Elo rating
mPLUG-Owl	1027.0	LA-V2	1023	LA-V2	1038.74
MiniGPT-4	1021.3	LLaVA	1019.9	MiniGPT-4	1009.87
Otter	1013.2	VPGTrans	1012.1	LLaVA	1002.76
LA-V2	1010.2	MiniGPT-4	1011.9	VPGTrans	999.63
LLaVA	1009.7	InstructBLIP	999.5	InstructBLIP	995.45
InstructBLIP	1003.7	mPLUG-Owl	996.3	mPLUG-Owl	985.60
VPGTrans	974.3	Otter	981.5	Otter	972.11
BLIP2	949.4	BLIP2	955.8	BLIP2	948.8

# PyTorch-like pseudo code
prompt = "{prompt}"
prompt_token_ids = tokenizer(prompt)
option = "{option}"
option_token_ids = tokenizer(option)
token_ids = prompt_token_ids + option_token_ids

inputs = token_ids[:-1]
targets = token_ids[1:]
logits = llm(inputs) # [len(inputs), vocab_size]

# only keep continuation for the prompt
logits = logits[-len(option_token_ids):]
logprobs = F.log_softmax(logits)
logprobs_option = torch.gather(logprobs, 1, option_token_ids)

logprobs_option_sum = logprobs_option.sum()
# optionally perform normalization
logprobs_option_sum_normalized = logprobs_option_sum / len(option_token_ids)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rebuttal_tables

About

Releases

Packages

BellXP/rebuttal_tables

Folders and files

Latest commit

History

Repository files navigation

rebuttal_tables

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages