GitHub - NKU-MetautoAI/awesome-large-vision-language-models: Advances in recent large vision language models (LVLMs)

🌐✨Summarizing the latest LLMs and VLMs! Helping you quickly and easily choose and use large models! 😄
This is the repository navigation page, the main Awesome List: LLMs🚀 | LVLMs🚀
Supported languages: 中文🚀 | English

Welcome to our repository🥰, a comprehensive navigation page that connects you to the most relevant resources and summary platforms for the latest large models (including LLMs🚀 and LVLMs🚀). Whether you're looking for benchmarks💯, comparisons⚖️, or surveys📖, we've got you covered.

Feel free to raise a issue or contact us if you find any related papers that are not included here. Organizer: Bocheng Hu@NKU (h1355393774@gmail.com), Gepeng Ji@ANU (gepengai.ji@gmail.com)

Quick Start —— large language models (LLMs) 🏁

Model	Date	Organization	Parameters	CheckPoint	Details
Gemma2	2024-06	Google	2.6B/9B/27B	Gemma2 Family🤗	EN/CH
YI-1.5	2024-05	01-ai	6B/9B/34B	Yi-1.5 Family🤗	EN/ZH
Llama 3	2024-04	Meta	8B/70B	Llama3 Family🤗	EN/ZH
phi3	2024-04	Microsoft	3.8B/7B/14B	Phi-3 family🤗 (only phi-3-mini is available now)	EN/ZH
Gemma	2024-02	Google	2B/7B	Gemma Family🤗	EN/ZH
Qwen1.5	2024-02	Alibaba	0.5B/1.8B/4B/7B/14B/72B	Qwen1.5🤗	EN/ZH
phi	2023-12	Microsoft	1B/1.5B/2B	phi-1B🤗 phi-1.5B🤗 phi-2B🤗	EN/ZH
Mamba	2023-12	Albert Gu and Tri Dao	130M/370M/790M/1.4B/2.8B	state-spaces🤗	EN/ZH
StripedHyena	2023-12	Together AI	7B	StripedHyena Family🤗	EN/ZH
YI	2023-11	01-ai	6B/9B/34B	Yi Family	EN/ZH
Orca2	2023-11	Microsoft	7B/13B	Orca Family🤗	EN/ZH
Mistral	2023-09	Mistral AI	7B	Mistral🤗	EN/ZH
Persimmon	2023-09	Adept AI Labs	8B	persimmon-8b-chat🤗	EN/ZH
Qwen	2023-08	Alibaba	0.5B/1.8B/4B/7B/14B/72B	Qwen🤗	EN/ZH
Llama 2	2023-07	Meta	7B/13B/70B	Llama2 Family🤗	EN/ZH
Falcon	2023-07	UAE	1.3B/7.5B/40B/180B	Falcon Family🤗	EN/ZH
XGen	2023-07	Salesforce	7B	xgen-7b-4k-base🤗	EN/ZH
Zephyr	2023-05	Hugging Face	7B	HuggingFaceH4🤗	EN/ZH
Pythia	2023-04	EleutherAI	14M～12B	Pythia Family🤗	EN/ZH
Vicuna	2023-03	LMSYS	7B/13B/33B	Vicuna🤗	EN/ZH

Quick Start —— (large vision language models) LVLMs🏁

Model	Date	Publication	Parameters	Demo	CheckPoint	Details
🔥new🔥 Cambrian-1	2024-06	arXiv	3B/8B/13B/34B	---	Cambrian-1🤗	EN/ZH
🔥new🔥 EVE	2024-06	arXiv	7B	---	comming soon	EN/ZH
🔥new🔥 Chameleon	2024-05	arXiv	7B/34B	---	facebook	EN/ZH
🔥new🔥 DenseConnector	2024-05	arXiv	2.7B→70B	---	DenseConnector🤗	EN/ZH
Llava	2023-04	NeurIPS 2023	7B/13B	Llava v1.6	Llava v1.5🤗 Lava v1.6🤗	EN/ZH
DeepSeek-VL	2024-03	arXiv	1.3B/7B	Chat with DeepSeek VL 7B	DeepSeek-VL Family🤗	EN/ZH
PaliGemma	2024-03	---	3B	PaliGemma	PaliGemma Family🤗	EN/ZH
MiniGemini (MGM)	2024-03	arXiv	2B/7B/13B/34B	MGM	MGM Family🤗	EN/ZH
HPT	2024-03	---	3-8B/6B	None	HPT🤗	EN/ZH
Bunny	2024-02	arXiv	2B/3B/4B/8B	Bunny	BAAI🤗	EN/ZH
TinyLLaVA	2024-02	arXiv	1.4B/2.4B/3.1B	None	TinyLLaVA🤗	EN/ZH
MiniCPM-V Series	2024-02	---	2B/8B	MiniCPM-Llama3-V-2 5 MiniCPM V 2	MiniCPM-2B Family🤗	EN/ZH
ALLaVA-Longer	2024-02	arXiv	3B	ALLaVA-Longer	ALLaVA-3B-Longer 🤗	EN/ZH
MM1	2024-02		---	None	None	EN/ZH
Vary-toy	2024-01	arXiv	---	Vary Family	Vary-toy🤗	EN/ZH
MoE-LLaVA	2024-01	arXiv	3B	MoE LLaVA	MoE-LLaVA Family🤗	EN/ZH
LLaVA-Phi	2024-01	arXiv	3B	None	None	EN/ZH
TinyGPT-V	2023-12	arXiv	---	TinyGPT-V	TinyGPT-V🤗	EN/ZH
MobileVLM Series	2023-12	arXiv	1.4B/1.7B/2.7B/7B	Invalid Now	mtgv🤗	EN/ZH
SCA	2023-12	arXiv	---	DEMO.md	SCA🤗	EN/ZH
Florence-2	2023-11	arXiv	120M/345M/1.2B/3B	Florence 2	Florence🤗	EN/ZH
Cog Series	2023-11	CVPR 2024	17B/18B	CogVLM & CogAgent	THUDM 🤗	EN/ZH
PaLI-3	2023-10	arXiv	5B	None	None(PaliGemma is based on PaLI-3)	EN/ZH
IMP	2024-05	arXiv	3B	xmbot.net	imp-v1-3b 🤗	EN/ZH
MiniGPT4 Series	2023-04	arXiv	7B/13B	Invalid Now	Vision-CAIR 🤗	EN/ZH
LLaVA-Phi-3-mini	2024-04	---	---	None	LLaVA-Phi-3-mini🤗	EN/ZH
Cobra	2024-03	arXiv	3.5B	Cobra	Cobra Family🤗	EN/ZH

Quick Start —— large model for Segmentation🏁

Model	Date	Publication	Parameters	Demo	Details
LISA	2023-08	CVPR 2024	13B	---	EN/ZH

Other Relevant Summary Platforms🏗️

this navigation page also links to other relevant summary platforms. Explore the sections below to find the information you need:

Benchmarking Inference Speed of Large Language Models🚀

GPU-Benchmarks-on-LLM-Inference uses various NVIDIA GPUs and Apple Silicon devices to test models like LLaMA 3 with the llama.cpp tool, measuring performance by tokens generated per second. It covers NVIDIA 3000, 4000, and A100 series, as well as Apple's M1, M2, and M3 chips.

Comprehensive Analysis and Comparison of Large Language Models🔍

The website LifeArchitect.ai/models provides a comprehensive analysis and comparison of large language models (LLMs) such as GPT-3, GPT-4, and PaLM, detailing their sizes, capabilities, and training data.

Reliable Measurement of Large Language Model Response Times⏱️

TheFastest.ai offers reliable performance measurements for popular large language models (LLMs) based on response times. It compares models across multiple data centers (e.g., US West, East, and Europe), focusing on metrics like Time to First Token (TTFT) and Tokens Per Second (TPS), with daily updated statistics.

Comprehensive Survey of Vision-Language Models📊

VLM_survey is a repository summarizing and surveying the latest vision-language models (VLMs), including links to relevant papers. It covers:

Overview of Vision-Language Models: Reviews VLM research in image classification, object detection, and semantic segmentation.
Pre-training Methods: Summarizes network architectures, pre-training objectives, and downstream tasks for VLMs.
Transfer Learning Methods: Discusses transfer learning strategies for VLMs in different tasks.
Knowledge Distillation Methods: Examines knowledge distillation techniques in tasks like object detection and semantic segmentation.

Latest Research, Datasets, and Evaluation Benchmarks in Multimodal Large Language Models📚

Check out the repository for the latest papers on multimodal large language models, covering topics such as multimodal chain-of-thought, LLM-aided visual reasoning, foundation models, and multimodal reinforcement learning from human feedback (RLHF).

It also includes a variety of datasets for pre-training, alignment, multimodal instruction tuning, in-context learning, and evaluation, along with benchmark tests to assess the performance and capabilities of different multimodal models.

A lightweight library for evaluating language models from OpenAI

OpenAI recently released a practical library for LLMs aimed at ensuring the transparency of the accuracy data they publish for their models, such as GPT-4-turbo,ChatGPT4 and ChatGPT4o. This library includes benchmarks like MMLU, MATH, GPQA, DROP, MGSM, and HumanEval.

Reference

@misc{hu2024awesome,
  author       = {Bocheng Hu, Ge-Peng Ji, Deng-Ping Fan},
  title        = {An awesome list of large vision language models},
  howpublished = {\url{https://github.com/NKU-MetautoAI/awesome-large-vision-language-models}},
  year         = {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
image		image
.gitignore		.gitignore
README.md		README.md
README_LLM.md		README_LLM.md
README_LLM_zh.md		README_LLM_zh.md
README_LVLMs.md		README_LVLMs.md
README_LVLMs_zh.md		README_LVLMs_zh.md
README_VPLLM.md		README_VPLLM.md
README_VPLLM_zh.md		README_VPLLM_zh.md
README_zh.md		README_zh.md
benchmark.xlsx		benchmark.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick Start —— large language models (LLMs) 🏁

Quick Start —— (large vision language models) LVLMs🏁

Quick Start —— large model for Segmentation🏁

Other Relevant Summary Platforms🏗️

Reference

About

Releases

Packages

Contributors 2

NKU-MetautoAI/awesome-large-vision-language-models

Folders and files

Latest commit

History

Repository files navigation

Quick Start —— large language models (LLMs) 🏁

Quick Start —— (large vision language models) LVLMs🏁

Quick Start —— large model for Segmentation🏁

Other Relevant Summary Platforms🏗️

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages