Skip to content

ZhanYang-nwpu/Awesome-Remote-Sensing-Multimodal-Large-Language-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 

Repository files navigation

Awesome-Remote-Sensing-Multimodal-Large-Language-Models

🔥🔥🔥 Multimodal Large Language Models for Remote Sensing: A Survey
[Project Page]This Page |

School of Artificial Intelligence, OPtics, and ElectroNics (iOPEN), Northwestern Polytechnical University

✨ The first survey for Multimodal Large Language Models for Remote Sensing (RS-MLLMs).

✨✨✨ Behold our meticulously curated trove of RS-MLLMs resources!!!

🎉🚀💡 The website will be updated in real-time to track the latest state of RS-MLLMs!!!

📑📚🔍 Feast your eyes on an assortment of model architecture, training pipelines, datasets, comprehensive evaluation benchmarks, intelligent agents for remote sensing, techniques for instruction tuning, and much more.

🌟🔥📢 A collection of remote sensing multimodal large language model papers focusing on the vision-language domain.

🍎 Multimodal Large Language Models for Remote Sensing

🍎 Intelligent Agents for Remote Sensing

Please share a STAR ⭐ if this project does help

📢 Latest Updates

In this repository, we will collect and document researchers and their outstanding work related to remote sensing multimodal large language model (vision-language).

  • The list will be continuously updated 🔥🔥
  • 📦 coming soon! 🚀
  • May-22-2024: The first RS-MLLMs review manuscript has been submitted for review. 🔥🔥

Table of Contents


Awesome Papers

Multimodal Large Language Models for Remote Sensing

Title Venue Date Code Note
Star
SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding
J. Luo et al.
arXiv 2024-06-14 Github -
Star
RS-LLaVA: A Large Vision-Language Model for Joint Captioning and Question Answering in Remote Sensing Imagery
Y. Bazi, L. Bashmal, M. M. Al Rahhal, R. Ricci, and F. Melgani.
Remote Sensing 2024-04-23 Github -
Star
H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model
C. Pang, W. Jiang, L. Jiayu, L. Yi, S. Jiaxing, L. Weijia, W. Xingxing, W. Shuai, F. Litong, X. Guisong, H.Conghui.
arXiv 2024-03-29 Github -
Popeye: A Unified Visual-Language Model for Multi-Source Ship Detection from Remote Sensing Imagery
W. Zhang, M. Cai, T. Zhang, G. Lei, Y. Zhuang, and X. Mao.
arXiv 2024-03-06 - -
Large Language Models for Captioning and Retrieving Remote Sensing Images
J. D. Silva, J. Magalhaes, and D. Tuia.
arXiv 2024-02-09 - -
Star
LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model
D. Muhtar, Z. Li, F. Gu, X. Zhang, and P. Xiao.
arXiv 2024-02-04 Github -
Star
EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain
W. Zhang, M. Cai, T. Zhang, Y. Zhuang, and X. Mao.
arXiv 2024-01-30 Github -
Star
SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model
Y. Zhan, Z. Xiong, and Y. Yuan.
arXiv 2024-01-18 Github Dataset
Star
GeoChat: Grounded Large Vision-Language Model for Remote Sensing
K. Kuckreja, M. S. Danish, M. Naseer, A. Das, S. Khan, and F. S. Khan.
arXiv 2023-11-24 Github accepted by CVPR-24
Star
RSGPT: A Remote Sensing Vision Language Model and Benchmark
Y. Hu, J. Yuan, and C. Wen.
arXiv 2023-07-28 Github -

Intelligent Agents for Remote Sensing

Title Venue Date Code Note
RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents
W. Xu, Z. Yu, Y. Wang, J. Wang, and M. Peng.
arXiv 2024-06-11 - -
GeoLLM-Engine: A Realistic Environment for Building Geospatial Copilots
S. Singh, M. Fore, D. Stamoulis, and D. Group.
arXiv 2024-04-23 - -
Evaluating Tool-Augmented Agents in Remote Sensing Platforms
S. Singh, M. Fore, and D. Stamoulis.
arXiv 2024-04-23 - -
Star
Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis
C. Liu, K. Chen, H. Zhang, Z. Qi, Z. Zou, and Z. Shi.
arXiv 2024-04-01 Github -
Star
Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models
H. Guo, X. Su, C. Wu, B. Du, L. Zhang, and D. Li.
arXiv 2024-01-17 Github -
Tree-GPT: Modular Large Language Model Expert System for Forest Remote Sensing Image Understanding and Interactive Analysis
S. Du, S. Tang, W. Wang, X. Li, and R. Guo.
arXiv 2023-10-07 - -

Vision-Language Pre-training Models for Remote Sensing

Title Venue Date Code Note
Star
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
Z. Zhang, T. Zhao, Y. Guo, and J. Yin.
arXiv 2024-01-02 Github -
Star
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
F. Liu, D. Chen, Z. Guan, X. Zhou, J. Zhu, and J. Zhou.
T-GRS 2024-04-18 Github arXiv
Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment
U. Mall, C. P. Phoo, M. K. Liu, C. Vondrick, B. Hariharan, and K. Bala.
ICLR 2024-01-16 Project arXiv
Star
RS-CLIP: Zero Shot Remote Sensing Scene Classification via Contrastive Vision-Language Supervision
X. Li, C. Wen, Y. Hu, and N. Zhou.
JAG 2023-09-18 Github -
Star
Parameter-Efficient Transfer Learning for Remote Sensing Image–Text Retrieval
Y. Yuan, Y. Zhan, and Z. Xiong.
T-GRS 2023-08-28 Github arXiv

Survey Papers for Remote Sensing Vision-Language Tasks

Title Venue Date Code Note
Star
Towards Vision-Language Geo-Foundation Model: A Survey
Y. Zhou, L. Feng, Y. Ke, X. Jiang, J. Yan, and W. Zhang.
arXiv 2024-06-13 Github arXiv
Vision-Language Models in Remote Sensing: Current progress and future trends
X. Li, C. Wen, Y. Hu, Z. Yuan, and X. X. Zhu.
MGRS 2024-04-22 - -
Language Integration in Remote Sensing: Tasks, datasets, and future directions
L. Bashmal, Y. Bazi, F. Melgani, M. M. Al Rahhal, and M. A. Al Zuair.
MGRS 2023-10-11 - -
Brain-Inspired Remote Sensing Foundation Models and Open Problems: A Comprehensive Survey
L. Jiao et al.
JSTARS 2023-09-18 - -

Others

Title Venue Date Code Note
On the Foundations of Earth and Climate Foundation Models
X. X. Zhu et al.
arXiv 2024-05-07 Github -
On the Promises and Challenges of Multimodal Foundation Models for Geographical, Environmental, Agricultural, and Urban Planning Applications
C. Tan et al.
arXiv 2023-12-23 - -
Star
Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs
J. Roberts, T. Lüddecke, R. Sheikh, K. Han, and S. Albanie.
arXiv 2023-11-24 Github -
The Potential of Visual ChatGPT for Remote Sensing
L. P. Osco, E. L. de Lemos, W. N. Gonçalves, A. P. M. Ramos, and J. Marcato Junior.
Remote Sensing 2023-06-22 - -

Awesome Datasets

Datasets of Pre-Training for Alignment

Title Venue Date Code Note
Star ChatEarthNet: A Global-Scale, High-Quality Image-Text Dataset for Remote Sensing
Z. Yuan, Z. Xiong, L. Mou, and X. X. Zhu.
arXiv 2024-02-17 Github Link
Star
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
Z. Zhang, T. Zhao, Y. Guo, and J. Yin.
arXiv 2024-01-02 Github -
Star
SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing
Z. Wang, R. Prabha, T. Huang, J. Wu, and R. Rajagopal.
AAAI 2024-03-24 Github arXiv

Datasets of Multimodal Instruction Tuning

Name Paper Link Note
FIT-RS SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding Link 1800.8k
RS-GPT4V RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote Sensing Image Understanding Link 991k
RS-instructions RS-LLaVA: A Large Vision-Language Model for Joint Captioning and Question Answering in Remote Sensing Imagery Link 7,058
SkyEye-968k SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model Link 968k
Multi-task Instruction LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model Link 42,322
MMRS-1M EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain Link >1M
RS-ClsQaGrd-Instruct H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model Link 78k
MMShip Popeye: A Unified Visual-Language Model for Multi-Source Ship Detection from Remote Sensing Imagery Link 81k
RS-Specialized-Instruct H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model Link 29.8k
RS multimodal instruction GeoChat: Grounded Large Vision-Language Model for Remote Sensing Link 318k
LHRS-Instruct LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model Link 39.8k
HqDC-Instruct H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model Link 30k

Latest Evaluation Benchmarks for Remote Sensing Vision-Language Tasks

Remote Sensing Image Captioning and Aerial Video Captioning

Remote Sensing Visual Question Answering and Remote Sensing Visual Grounding

Remote Sensing Image-Text Retrieval

Remote Sensing Scene Classification

🤖 Contact

If you have any questions about this project, please feel free to contact zhanyangnwpu@gmail.com.

About

Multimodal Large Language Models for Remote Sensing (RS-MLLMs): A Survey

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages