MMComposition

✨ Benchmarking the Compositionality for Pre-trained Vision-Language Models

🌐 Homepage | 🔬 Paper ｜ 👩‍💻 Code

In the quest for advancing vision-language models (VLMs), recent developments such as GPT-4V, LLaVA, mPLUG, MiniGPT-4, and BLIP have shown impressive capabilities in complex reasoning tasks. However, these models still face challenges in understanding fine-grained multimodal compositional information, limiting their reliability and performance. To address this, we introduce MMComposition, a novel benchmark specifically designed to evaluate the compositionality of VLMs comprehensively. MMComposition assesses VLMs across two main dimensions: vision-language (VL) compositional understanding and VL compositional reasoning. Unlike previous benchmarks that focus on single-choice questions or open-ended text generation, MMComposition provides a diverse set of tasks including single-choice questions, indefinite-choice questions, text generation, and text-image matching. This diversity ensures a thorough evaluation of the models' ability to understand and reason with compositional information across modalities. Our findings reveal that even state-of-the-art models like GPT-4 struggle with nuanced compositional reasoning tasks. These insights highlight the need for further research to enhance VLMs' compositional abilities. Our key contributions are: Proposing MMComposition, the first comprehensive benchmark for evaluating the compositionality of pretrained VLMs. Providing a thorough experimental evaluation of state-of-the-art VLMs' compositionality. Benchmarking a set of well-known VLMs using the proposed MMComposition benchmark. MMComposition aims to inspire advancements in VLM design and training, ultimately improving their performance in understanding and reasoning with complex multimodal information.

🏆 Leaderboard

📉 Statistics

✏️ Citation

@article{hua2024mmcomposition,
      title={MMComposition: Benchmarking the Compositionality for Pre-trained Vision-Language Models},
      author={Hua, Hang and Tang, Yunlong and Zeng, Ziyun and Cao, Liangliang and Yang, Zhengyuan and He, Hangfeng and Xu, Chenliang and Luo, Jiebo},
      journal={},
      year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MMComposition

✨ Benchmarking the Compositionality for Pre-trained Vision-Language Models

🏆 Leaderboard

📉 Statistics

✏️ Citation

Under construction...

About

Releases

Packages

yunlong10/MMComposition

Folders and files

Latest commit

History

Repository files navigation

MMComposition

✨ Benchmarking the Compositionality for Pre-trained Vision-Language Models

🏆 Leaderboard

📉 Statistics

✏️ Citation

Under construction...

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages