Skip to content

NKU-MetautoAI/awesome-large-vision-language-models

Repository files navigation

🌐✨Summarizing the latest LLMs and VLMs! Helping you quickly and easily choose and use large models! 😄
This is the repository navigation page, the main Awesome List: LLMs🚀 | LVLMs🚀
Supported languages: 中文🚀 | English

Welcome to our repository🥰, a comprehensive navigation page that connects you to the most relevant resources and summary platforms for the latest large models (including LLMs🚀 and LVLMs🚀). Whether you're looking for benchmarks💯, comparisons⚖️, or surveys📖, we've got you covered.

Feel free to raise a issue or contact us if you find any related papers that are not included here. Organizer: Bocheng Hu@NKU (h1355393774@gmail.com), Gepeng Ji@ANU (gepengai.ji@gmail.com)

Quick Start —— large language models (LLMs) 🏁

Model Date Organization Paper Parameters CheckPoint Details
Gemma2 2024-06 Google AI Blog 2.6B/9B/27B Gemma2 Family🤗 EN/CH
YI-1.5 2024-05 01-ai arXiv 6B/9B/34B Yi-1.5 Family🤗 EN/ZH
Llama 3 2024-04 Meta AI Blog 8B/70B Llama3 Family🤗 EN/ZH
phi3 2024-04 Microsoft arXiv 3.8B/7B/14B Phi-3 family🤗 (only phi-3-mini is available now) EN/ZH
Gemma 2024-02 Google AI Blog 2B/7B Gemma Family🤗 EN/ZH
Qwen1.5 2024-02 Alibaba AI Blog 0.5B/1.8B/4B/7B/14B/72B Qwen1.5🤗 EN/ZH
phi 2023-12 Microsoft AI Blog 1B/1.5B/2B phi-1B🤗
phi-1.5B🤗
phi-2B🤗
EN/ZH
Mamba 2023-12 Albert Gu and Tri Dao arXiv 130M/370M/790M/1.4B/2.8B state-spaces🤗 EN/ZH
StripedHyena 2023-12 Together AI AI Blog 7B StripedHyena Family🤗 EN/ZH
YI 2023-11 01-ai arXiv 6B/9B/34B Yi Family EN/ZH
Orca2 2023-11 Microsoft arXiv 7B/13B Orca Family🤗 EN/ZH
Mistral 2023-09 Mistral AI arXiv 7B Mistral🤗 EN/ZH
Persimmon 2023-09 Adept AI Labs AI Blog 8B persimmon-8b-chat🤗 EN/ZH
Qwen 2023-08 Alibaba arXiv 0.5B/1.8B/4B/7B/14B/72B Qwen🤗 EN/ZH
Llama 2 2023-07 Meta arXiv 7B/13B/70B Llama2 Family🤗 EN/ZH
Falcon 2023-07 UAE AI Blog 1.3B/7.5B/40B/180B Falcon Family🤗 EN/ZH
XGen 2023-07 Salesforce arXiv 7B xgen-7b-4k-base🤗 EN/ZH
Zephyr 2023-05 Hugging Face arXiv 7B HuggingFaceH4🤗 EN/ZH
Pythia 2023-04 EleutherAI arXiv 14M~12B Pythia Family🤗 EN/ZH
Vicuna 2023-03 LMSYS AI Blog 7B/13B/33B Vicuna🤗 EN/ZH

Quick Start —— (large vision language models) LVLMs🏁

Model Date Publication Parameters Demo Paper Github CheckPoint Details
🔥new🔥
Cambrian-1
2024-06 arXiv 3B/8B/13B/34B --- arXiv GitHub Cambrian-1🤗 EN/ZH
🔥new🔥
EVE
2024-06 arXiv 7B --- arXiv GitHub comming soon EN/ZH
🔥new🔥
Chameleon
2024-05 arXiv 7B/34B --- arXiv GitHub facebook EN/ZH
🔥new🔥
DenseConnector
2024-05 arXiv 2.7B→70B --- arXiv GitHub DenseConnector🤗 EN/ZH
Llava 2023-04 NeurIPS 2023 7B/13B Llava v1.6 arXiv GitHub Llava v1.5🤗
Lava v1.6🤗
EN/ZH
DeepSeek-VL 2024-03 arXiv 1.3B/7B Chat with DeepSeek VL 7B arXiv GitHub DeepSeek-VL Family🤗 EN/ZH
PaliGemma 2024-03 --- 3B PaliGemma AI Blog GitHub PaliGemma Family🤗 EN/ZH
MiniGemini
(MGM)
2024-03 arXiv 2B/7B/13B/34B MGM arXiv GitHub MGM Family🤗 EN/ZH
HPT 2024-03 --- 3-8B/6B None AI Blog GitHub HPT🤗 EN/ZH
Bunny 2024-02 arXiv 2B/3B/4B/8B Bunny arXiv GitHub BAAI🤗 EN/ZH
TinyLLaVA 2024-02 arXiv 1.4B/2.4B/3.1B None arXiv GitHub TinyLLaVA🤗 EN/ZH
MiniCPM-V Series 2024-02 --- 2B/8B MiniCPM-Llama3-V-2 5
MiniCPM V 2
AI Blog GitHub MiniCPM-2B Family🤗 EN/ZH
ALLaVA-Longer 2024-02 arXiv 3B ALLaVA-Longer arXiv GitHub ALLaVA-3B-Longer 🤗 EN/ZH
MM1 2024-02 --- None arXiv None EN/ZH
Vary-toy 2024-01 arXiv --- Vary Family arXiv GitHub Vary-toy🤗 EN/ZH
MoE-LLaVA 2024-01 arXiv 3B MoE LLaVA arXiv GitHub MoE-LLaVA Family🤗 EN/ZH
LLaVA-Phi 2024-01 arXiv 3B None arXiv GitHub None EN/ZH
TinyGPT-V 2023-12 arXiv --- TinyGPT-V arXiv GitHub TinyGPT-V🤗 EN/ZH
MobileVLM Series 2023-12 arXiv 1.4B/1.7B/2.7B/7B Invalid Now arXiv
arXiv
GitHub mtgv🤗 EN/ZH
SCA 2023-12 arXiv --- DEMO.md arXiv GitHub SCA🤗 EN/ZH
Florence-2 2023-11 arXiv 120M/345M/1.2B/3B Florence 2 arXiv GitHub Florence🤗 EN/ZH
Cog Series 2023-11 CVPR 2024 17B/18B CogVLM & CogAgent arXiv GitHub THUDM 🤗 EN/ZH
PaLI-3 2023-10 arXiv 5B None arXiv GitHub None(PaliGemma is based on PaLI-3) EN/ZH
IMP 2024-05 arXiv 3B xmbot.net arXiv GitHub imp-v1-3b 🤗 EN/ZH
MiniGPT4 Series 2023-04 arXiv 7B/13B Invalid Now arXiv
arXiv
GitHub Vision-CAIR 🤗 EN/ZH
LLaVA-Phi-3-mini 2024-04 --- --- None GitHub LLaVA-Phi-3-mini🤗 EN/ZH
Cobra 2024-03 arXiv 3.5B Cobra arXiv GitHub Cobra Family🤗 EN/ZH

Quick Start —— large model for Segmentation🏁

Model Date Publication Parameters Demo Paper Github CheckPoint Details
LISA 2023-08 CVPR 2024 13B --- arXiv GitHub Hugging Face model EN/ZH

Other Relevant Summary Platforms🏗️

this navigation page also links to other relevant summary platforms. Explore the sections below to find the information you need:

  • Benchmarking Inference Speed of Large Language Models🚀

GPU-Benchmarks-on-LLM-Inference uses various NVIDIA GPUs and Apple Silicon devices to test models like LLaMA 3 with the llama.cpp tool, measuring performance by tokens generated per second. It covers NVIDIA 3000, 4000, and A100 series, as well as Apple's M1, M2, and M3 chips.

  • Comprehensive Analysis and Comparison of Large Language Models🔍

The website LifeArchitect.ai/models provides a comprehensive analysis and comparison of large language models (LLMs) such as GPT-3, GPT-4, and PaLM, detailing their sizes, capabilities, and training data.

  • Reliable Measurement of Large Language Model Response Times⏱️

TheFastest.ai offers reliable performance measurements for popular large language models (LLMs) based on response times. It compares models across multiple data centers (e.g., US West, East, and Europe), focusing on metrics like Time to First Token (TTFT) and Tokens Per Second (TPS), with daily updated statistics.

  • Comprehensive Survey of Vision-Language Models📊

VLM_survey is a repository summarizing and surveying the latest vision-language models (VLMs), including links to relevant papers. It covers:

  1. Overview of Vision-Language Models: Reviews VLM research in image classification, object detection, and semantic segmentation.
  2. Pre-training Methods: Summarizes network architectures, pre-training objectives, and downstream tasks for VLMs.
  3. Transfer Learning Methods: Discusses transfer learning strategies for VLMs in different tasks.
  4. Knowledge Distillation Methods: Examines knowledge distillation techniques in tasks like object detection and semantic segmentation.
  • Latest Research, Datasets, and Evaluation Benchmarks in Multimodal Large Language Models📚

Check out the repository for the latest papers on multimodal large language models, covering topics such as multimodal chain-of-thought, LLM-aided visual reasoning, foundation models, and multimodal reinforcement learning from human feedback (RLHF).

It also includes a variety of datasets for pre-training, alignment, multimodal instruction tuning, in-context learning, and evaluation, along with benchmark tests to assess the performance and capabilities of different multimodal models.

  • A lightweight library for evaluating language models from OpenAI

OpenAI recently released a practical library for LLMs aimed at ensuring the transparency of the accuracy data they publish for their models, such as GPT-4-turbo,ChatGPT4 and ChatGPT4o. This library includes benchmarks like MMLU, MATH, GPQA, DROP, MGSM, and HumanEval.

Reference

@misc{hu2024awesome,
  author       = {Bocheng Hu, Ge-Peng Ji, Deng-Ping Fan},
  title        = {An awesome list of large vision language models},
  howpublished = {\url{https://github.com/NKU-MetautoAI/awesome-large-vision-language-models}},
  year         = {2024}
}