📢 A collection of remote sensing multimodal large language model papers focusing on the vision-language domain.
School of Artificial Intelligence, OPtics, and ElectroNics (iOPEN), Northwestern Polytechnical University
In this repository, we will collect and document researchers and their outstanding work related to remote sensing multimodal large language model (vision-language).
- The list will be continuously updated 🔥🔥
- 📦 coming soon! 🚀
- Papers
- Remote Sensing Vision-Language Dataset
- related: Remote Sensing Vision-Language Foundation Models
- 🔥 Apr-23-24: RS-LLaVA: A Large Vision-Language Model for Joint Captioning and Question Answering in Remote Sensing Imagery
Remote Sensing 2024 (doi: 10.3390/rs16091477). Y. Bazi, L. Bashmal, M. M. Al Rahhal, R. Ricci, and F. Melgani. [Paper][Code]
- 🔥 Mar-29-24: H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model
arXiv 2024 (arXiv:2403.20213). C. Pang, W. Jiang, L. Jiayu, L. Yi, S. Jiaxing, L. Weijia, W. Xingxing, W. Shuai, F. Litong, X. Guisong, H.Conghui. [Paper][Code]
- 🔥 Mar-6-24: Popeye: A Unified Visual-Language Model for Multi-Source Ship Detection from Remote Sensing Imagery
arXiv 2024 (arXiv:2403.03790). W. Zhang, M. Cai, T. Zhang, G. Lei, Y. Zhuang, and X. Mao. [Paper][[Code]:Null]
- 🔥 Feb-9-24: Large Language Models for Captioning and Retrieving Remote Sensing Images
arXiv 2024 (arXiv:2402.06475). J. D. Silva, J. Magalhaes, and D. Tuia. [Paper][[Code]:Null]
- 🔥 Feb-4-24: LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model
arXiv 2024 (arXiv:2402.02544). D. Muhtar, Z. Li, F. Gu, X. Zhang, and P. Xiao. [Paper][Code]
- 🔥 Jan-30-24: EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain
arXiv 2024 (arXiv:2401.16822). W. Zhang, M. Cai, T. Zhang, Y. Zhuang, and X. Mao. [Paper][Code]
- 🔥 Jan-18-24: SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model
arXiv 2024 (arXiv:2401.09712). Y. Zhan, Z. Xiong, and Y. Yuan. [Paper][Code]
- 🔥 Nov-30-23: Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs
arXiv 2023 (arXiv:2311.14656). J. Roberts, T. Lüddecke, R. Sheikh, K. Han, and S. Albanie. [Paper][Code]
- 🔥 Nov-28-23: GeoChat: Grounded Large Vision-Language Model for Remote Sensing
arXiv 2023 (arXiv:2311.15826). K. Kuckreja, M. S. Danish, M. Naseer, A. Das, S. Khan, and F. S. Khan. [Paper][Code]
- 🔥 Jul-28-23: RSGPT: A Remote Sensing Vision Language Model and Benchmark
arXiv 2023 (arXiv:2307.15266). Y. Hu, J. Yuan, and C. Wen. [Paper][Code]
- 🔥 Feb-17-24: ChatEarthNet: A Global-Scale, High-Quality Image-Text Dataset for Remote Sensing
arXiv 2024 (arXiv:2402.11325). Z. Yuan, Z. Xiong, L. Mou, and X. X. Zhu. [Paper][[Code]:Null)]
- 🔥 Jan-2-24: RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
arXiv 2023 (arXiv:2306.11300). Z. Zhang, T. Zhao, Y. Guo, and J. Yin. [Paper][Code]
- 🔥 Dec-20-23: SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing
AAAI 2024 (arXiv:2312.12856). Z. Wang, R. Prabha, T. Huang, J. Wu, and R. Rajagopal. [Paper][Code]
- 🔥 Jan-2-24: RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
arXiv 2023 (arXiv:2306.11300). Z. Zhang, T. Zhao, Y. Guo, and J. Yin. [Paper][Code]
- 🔥 Dec-12-23: Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment
arXiv 2023 (arXiv:2312.06960). U. Mall, C. P. Phoo, M. K. Liu, C. Vondrick, B. Hariharan, and K. Bala. [Paper][[Code]:Null]
- 🔥 Aug-10-23: RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
arXiv 2023 (arXiv:2306.11029). F. Liu, D. Chen, Z. Guan, X. Zhou, J. Zhu, and J. Zhou. [Paper][Code]