[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
-
Updated
Jul 4, 2024 - Python
[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
[MICCAI'24] Incorporating Clinical Guidelines through Adapting Multi-modal Large Language Model for Prostate Cancer PI-RADS Scoring
使用LLaMA-Factory微调多模态大语言模型的示例代码 Demo of Finetuning Multimodal LLM with LLaMA-Factory
MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discover
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
Undergraduate Dissertation of Guilin University of Electronic Technology
中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine
A Video Chat Agent with Temporal Prior
Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
A collection of visual instruction tuning datasets.
EVE: Encoder-Free Vision-Language Models
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".
A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust)
Add a description, image, and links to the mllm topic page so that developers can more easily learn about it.
To associate your repository with the mllm topic, visit your repo's landing page and select "manage topics."