MLLM-Research-Learn

Conducting learning and research on MLLM based on the MME rankings.

MLLM List

36 advanced MLLMs, including BLIP-2, InstructBLIP, LLaVA, MiniGPT-4, mPLUG-Owl, LLaMA-Adapter V2, ImageBind_LLM, Otter, VisualGLM-6B, Multimodal-GPT, PandaGPT, VPGTrans, LaVIN, Lynx, Octopus, LRV-Instruction, Cheetor, MMICL, GIT2, BLIVA, Skywork-MM, Qwen-VL-Chat, InternLM-XComposer-VL, Lion, Muffin, WeMM, SPHINX, InfMLLM, mPLUG-Owl2, GPT-4V, CVLM, LVIS-INSTRUCT4V, Kanva, DataOptim, ShareGPT4Vand BELLE-VL .

Models_Perception	Models_Cognition

FlanT5xxl

Num.	Arch.	Model	Version	Perception	Cognition
1	FlanT5xxl	BLIP-2	Flant5xxl	1293.84	290.00
2	FlanT5xxl	InstructBLIP	FlanT5xxl	1212.82	291.79
3	FlanT5xxl	MMICL	FlanT5xxl	1381.73	428.93
4	FlanT5xxl	BLIVA	FlanT5xxl	1337.73	331.43

more details

Perception

Models	version	existence	count	position	color	OCR	posters	cast	scene	landmark	artwork	score
BLIP-2	Flant5xxl	160.00	135.00	73.33	148.33	110.00	141.84	105.59	145.25	138.00	136.50	1293.84
InstructBLIP	FlanT5xxl	185.00	143.33	66.67	153.33	72.50	123.81	101.18	153.00	79.75	134.25	1212.82
MMICL	FlanT5xxl	170.00	160.00	81.67	156.67	100.00	146.26	141.76	153.75	136.13	135.50	1381.73
BLIVA	FlanT5xxl	180.00	138.33	81.67	180.00	87.50	155.10	140.88	151.50	89.50	133.25	1337.73

Cognition

Models	version	Common_Sense_Reasoning	Numerical_Calculation	Text_Translation	Code_Reasoning	score
BLIP-2	Flant5xxl	110.00	40.00	65.00	75.00	290.00
InstructBLIP	FlanT5xxl	129.29	40.00	65.00	57.50	291.79
MMICL	FlanT5xxl	136.43	82.50	132.50	77.50	428.93
BLIVA	FlanT5xxl	136.43	57.50	77.50	60.00	331.43

LLaMA/LLaMA2

Num.	Arch.	Model	Version	Perception	Cognition
1	LLaMA	mPLUG-Owl	Llama-7B	967.34	276.07
2	LLaMA	SPHINX	LLaMA2-13B	1560.15	310.00
3	LLaMA	LaVIN	LAVIN-13B	963.60	249.64
4	LLaMA	mPLUG-Owl2	LLaMA2-7B	1450.20	313.21
5	LLaMA	LLaMA-Adapter V2	LLaMA-Adapter-v2.1-7B	1328.39	356.43

more details

Perception

Models	version	existence	count	position	color	OCR	posters_200	cast_200	scene_200	landmark_200	artwork_200	score
mPLUG-Owl	Llama-7B	120.00	50.00	50.00	55.00	65.00	136.05	100.29	135.50	159.25	96.25	967.34
SPHINX	LLaMA2-13B	195.00	160.00	153.33	160.00	87.50	164.29	177.94	160.00	168.09	134.00	1560.15
LaVIN	LAVIN-13B	185.00	88.33	63.33	75.00	107.50	79.59	47.35	136.75	93.50	87.25	963.60
mPLUG-Owl2	LLaMA2-7B	185.00	155.00	88.33	150.00	102.50	160.20	164.41	153.25	157.25	134.25	1450.20
LLaMA-Adapter V2	LLaMA-Adapter-v2.1-7B	185.00	133.33	56.67	118.33	102.50	147.96	136.76	156.25	167.84	123.75	1328.39

Cognition

Models	version	Common_Sense_Reasoning	Numerical_Calculation	Text_Translation	Code_Reasoning	score
mPLUG-Owl	Llama-7B	78.57	60.00	80.00	57.50	276.07
SPHINX	LLaMA2-13B	130.00	55.00	75.00	50.00	310.00
LaVIN	LAVIN-13B	87.14	65.00	47.50	50.00	249.64
mPLUG-Owl2	LLaMA2-7B	115.71	35.00	102.50	60.00	313.21
LLaMA-Adapter V2	LLaMA-Adapter-v2.1-7B	106.43	47.50	112.50	90.00	356.43

Vicuna

Num.	Arch.	Model	Version	Perception	Cognition
1	Vicuna	MiniGPT-4	Vicuna-13B	581.66	144.29
2	Vicuna	PandaGPT	Vicuna-7B	642.59	228.57
3	Vicuna	LLaVA	Vicuna-13B	1531.31	295.36
4	Vicuna	LaVIN	LAVIN-13B	963.60	249.64
5	Vicuna	VPGTrans	Vicuna-7B	790.45	249.29
6	Vicuna	Lynx	Vicuna-7B	1373.24	215.71
7	Vicuna	Cheetor	Vicuna-7B	1299.97	321.07
8	Vicuna	Muffin	Vicuna-13B	1281.02	290.00
9	Vicuna	InfMLLM	Vicuna-13B	1567.99	347.14
10	Vicuna	CVLM	Vicuna-13B	1636.45	488.93
11	Vicuna	LVIS-INSTRUCT4V	Vicuna-13B	1574.89	286.79
12	Vicuna	ShareGPT4V	Vicuna-13B	1618.70	303.21
13	Vicuna	DataOptim-LLaVA	Vicuna-13B	1563.56	361.07

more details

Perception

Models	version	existence	count	position	color	OCR	posters	cast	scene	landmark	artwork	score
MiniGPT-4	Vicuna-13B	68.33	55.00	43.33	75.00	57.50	41.84	54.41	71.75	54.00	60.50	581.66
PandaGPT	Vicuna-7B	70.00	50.00	50.00	50.00	50.00	76.53	57.06	118.00	69.75	51.25	642.59
LLaVA	Vicuna-13B	185.00	155.00	133.33	170.00	125.00	160.54	152.94	161.25	170.50	117.75	1531.31
LaVIN	LAVIN-13B	185.00	88.33	63.33	75.00	107.50	79.59	47.35	136.75	93.50	87.25	963.60
VPGTrans	Vicuna-7B	70.00	85.00	63.33	73.33	77.50	84.01	53.53	141.75	64.75	77.25	790.45
Lynx	Vicuna-7B	195.00	151.67	90.00	170.00	77.50	124.83	118.24	164.50	162.00	119.50	1373.24
Cheetor	Vicuna-7B	180.00	96.67	80.00	116.67	100.00	147.28	164.12	156.00	145.73	113.50	1299.97
Muffin	Vicuna-13B	195.00	163.33	66.67	165.00	57.50	137.76	81.76	151.25	146.25	116.50	1281.02
InfMLLM	Vicuna-13B	190.00	151.67	143.33	185.00	132.50	163.27	161.47	165.25	167.00	108.50	1567.99
CVLM	Vicuna-13B	185.00	155.00	178.33	185.00	155.00	162.24	155.88	162.75	169.50	127.75	1636.45
LVIS-INSTRUCT4V	Vicuna-13B	195.00	160.00	128.33	180.00	132.50	162.59	161.47	163.25	161.50	130.25	1574.89
ShareGPT4V	Vicuna-13B	190.00	165.00	153.33	185.00	132.50	169.05	153.82	168.00	174.00	128.00	1618.70
DataOptim-LLaVA	Vicuna-13B	190.00	165.00	121.67	155.00	162.50	169.73	159.41	166.50	160.00	113.75	1563.56

Cognition

Models	version	Common_Sense_Reasoning	Numerical_Calculation	Text_Translation	Code_Reasoning	score
MiniGPT-4	Vicuna-13B	59.29	45.00	0.00	40.00	144.29
PandaGPT	Vicuna-7B	73.57	50.00	57.50	47.50	228.57
LLaVA	Vicuna-13B	127.86	42.50	77.50	47.50	295.36
LaVIN	LAVIN-13B	87.14	65.00	47.50	50.00	249.64
VPGTrans	Vicuna-7B	64.29	50.00	77.50	57.50	249.29
Lynx	Vicuna-7B	110.71	17.50	42.50	45.00	215.71
Cheetor	Vicuna-7B	98.57	77.50	57.50	87.50	321.07
Muffin	Vicuna-13B	137.76	81.76	151.25	146.25	116.50
InfMLLM	Vicuna-13B	132.14	60.00	102.50	52.50	347.14
CVLM	Vicuna-13B	131.43	137.50	147.50	72.50	488.93
LVIS-INSTRUCT4V	Vicuna-13B	134.29	40.00	70.00	42.50	286.79
ShareGPT4V	Vicuna-13B	125.71	45.00	80.00	52.50	303.21
DataOptim-LLaVA	Vicuna-13B	123.57	47.50	110.00	80.00	361.07

OpenFlamingo

Num.	Arch.	Model	Version	Perception	Cognition
1	OpenFlamingo	Multimodal-GPT	Multimodal-GPT-9B	654.72	226.79
2	OpenFlamingo	Otter	OTTER-Image-MPT7B	1292.26	306.43

more details

#### Perception

Models	version	existence	count	position	color	OCR	posters	cast	scene	landmark	artwork	score
Multimodal-GPT	Multimodal-GPT-9B	61.67	55.00	58.33	68.33	82.50	57.82	73.82	68.00	69.75	59.50	654.72
Otter	OTTER-Image-MPT7B	195.00	88.33	86.67	113.33	72.50	138.78	172.65	158.75	137.25	129.00	1292.26

Cognition

Models	version	Common_Sense_Reasoning_2	Numerical_Calculation	Text_Translation	Code_Reasoning	score
Multimodal-GPT	Multimodal-GPT-9B	49.29	62.50	60.00	55.00	226.79
Otter	OTTER-Image-MPT7B	106.43	72.50	57.50	70.00	306.43

InternLM

Num.	Arch.	Model	Version	Perception	Cognition
1	InternLM	InternLM-XComposer-VL	InternLM-7B	1528.45	391.07
2	InternLM	Lion	InternLM-7B	1545.80	445.71
3	InternLM	WeMM	InternLM-7B	1621.66	445.00

more details

#### Perception

Models	version	existence	count	position	color	OCR	posters	cast	scene	landmark	artwork	score
InternLM-XComposer-VL	InternLM-7B	190.00	158.33	126.67	165.00	125.00	161.90	150.29	159.75	165.25	126.25	1528.45
Lion	InternLM-7B	190.00	155.00	153.33	180.00	72.50	181.63	150.59	159.00	173.00	130.75	1545.80
WeMM	InternLM-7B	195.00	140.00	126.67	168.33	147.50	160.54	179.12	176.25	172.25	156.00	1621.66

Cognition

Models	version	Common_Sense_Reasoning	Numerical_Calculation	Text_Translation	Code_Reasoning	score
InternLM-XComposer-VL	InternLM-7B	138.57	55.00	112.50	85.00	391.07
Lion	InternLM-7B	125.71	105.00	147.50	67.50	445.71
WeMM	InternLM-7B	140.00	57.50	130.00	117.50	445.00

Qwen

Num.	Arch.	Model	Version	Perception	Cognition
1	Qwen	Qwen-VL-Chat	Qwen-7B	1487.58	360.71
2	Qwen	Kanva	Qwen-14B	1666.08	217.14
3	Qwen	BELLE-VL	Qwen-14B	1595.34	332.14

more details

#### Perception

Models	version	existence	count	position	color	OCR	posters_200	cast_200	scene_200	landmark_200	artwork_200	score
Qwen-VL-Chat	Qwen-7B	158.33	150.00	128.33	170.00	140.00	178.57	120.59	152.25	164.00	125.50	1487.58
Kanva	Qwen-14B	195.00	156.67	185.00	160.00	152.50	140.82	145.00	179.75	184.34	167.00	1666.08
BELLE-VL	Qwen-14B	190.00	150.00	130.00	175.00	177.50	166.33	136.76	156.25	174.00	139.50	1595.34

Cognition

Models	version	Common_Sense_Reasoning	Numerical_Calculation	Text_Translation	Code_Reasoning	score
Qwen-VL-Chat	Qwen-7B	130.71	40.00	147.50	42.50	360.71
Kanva	Qwen-14B	72.14	50.00	50.00	45.00	217.14
BELLE-VL	Qwen-14B	127.14	47.50	102.50	55.00	332.14

MPT

Num.	Arch.	Model	Version	Perception	Cognition
1	MPT	Octopus	MPT7B	1095.75	312.50
2	MPT	Otter	OTTER-Image-MPT7B	1292.26	306.43

more details

#### Perception

Models	version	existence	count	position	color	OCR	posters	cast	scene	landmark	artwork	score
Octopus	MPT7B	180.00	53.33	48.33	103.33	65.00	138.10	129.41	157.25	126.00	95.00	1095.75
Otter	OTTER-Image-MPT7B	195.00	88.33	86.67	113.33	72.50	138.78	172.65	158.75	137.25	129.00	1292.26

Cognition

Models	version	Common_Sense_Reasoning	Numerical_Calculation	Text_Translation	Code_Reasoning	score
Octopus	MPT7B	100.00	47.50	102.50	62.50	312.50
Otter	OTTER-Image-MPT7B	106.43	72.50	57.50	70.00	306.43

GLM

Num.	Arch.	Model	Version	Perception	Cognition
1	GLM	VisualGLM-6B	VisualGLM-6B	705.31	181.79

more details

#### Perception

Models	version	existence	count	position	color	OCR	posters	cast	scene	landmark	artwork	score
VisualGLM-6B	VisualGLM-6B	85.00	50.00	48.33	55.00	42.50	65.99	53.24	146.25	83.75	75.25	705.31

Cognition

Models	version	Common_Sense_Reasoning	Numerical_Calculation	Text_Translation	Code_Reasoning	score
VisualGLM-6B	VisualGLM-6B	39.29	45.00	50.00	47.50	181.79

Other

Num.	Arch.	Model	Version	Perception	Cognition
1	imagebind_huge+Open-Chinese-LLaMA-7B	ImageBind_LLM	imagebind_LLM-7B	775.77	213.57
2	MiniGPT/LLaMA	LRV-Instruction	LRV-7B	1299.79	286.79

more details

#### Perception

Models	version	existence	count	position	color	OCR	posters	cast	scene	landmark	artwork	score
ImageBind_LLM	imagebind_LLM-7B	128.33	60.00	46.67	73.33	80.00	64.97	76.47	113.25	62.00	70.75	775.77
LRV-Instruction	LRV-7B	165.00	111.67	86.67	165.00	110.00	139.04	112.65	147.98	160.53	101.25	1299.79

Cognition

Models	version	Common_Sense_Reasoning_2	Numerical_Calculation	Text_Translation	Code_Reasoning	score
ImageBind_LLM	imagebind_LLM-7B	48.57	55.00	50.00	60.00	213.57
LRV-Instruction	LRV-7B	100.71	70.00	85.00	72.50	328.21

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
images		images
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLLM-Research-Learn

MLLM List

MLLM Arch

FlanT5xxl

Perception

Cognition

LLaMA/LLaMA2

Perception

Cognition

Vicuna

Perception

Cognition

OpenFlamingo

Cognition

InternLM

Cognition

Qwen

Cognition

MPT

Cognition

GLM

Cognition

Other

Cognition

About

Releases

Packages

License

isLinXu/MLLM-Research-Learn

Folders and files

Latest commit

History

Repository files navigation

MLLM-Research-Learn

MLLM List

MLLM Arch

FlanT5xxl

Perception

Cognition

LLaMA/LLaMA2

Perception

Cognition

Vicuna

Perception

Cognition

OpenFlamingo

Cognition

InternLM

Cognition

Qwen

Cognition

MPT

Cognition

GLM

Cognition

Other

Cognition

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages