mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
-
Updated
Jul 21, 2023 - Python
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
A Video Chat Agent with Temporal Prior
MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discover
A collection of visual instruction tuning datasets.
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine
Undergraduate Dissertation of Guilin University of Electronic Technology
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
使用LLaMA-Factory微调多模态大语言模型的示例代码 Demo of Finetuning Multimodal LLM with LLaMA-Factory
[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Add a description, image, and links to the mllm topic page so that developers can more easily learn about it.
To associate your repository with the mllm topic, visit your repo's landing page and select "manage topics."