Skip to content

[TGRS24] The official PyTorch implementation of the paper "RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering".

Notifications You must be signed in to change notification settings

Y-D-Wang/RSAdapter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

RSAdapter

The official PyTorch implementation of the paper "RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering".

If you find our work useful in your research, please cite:

@article{wang2024rsadapter,
  title={RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering},
  author={Wang, Yuduo and Ghamisi, Pedram},
  journal={IEEE Transactions on Geoscience and Remote Sensing},
  year={2024},
  publisher={IEEE}
}

Introduction

In this work, we introduce a novel method known as RSAdapter, which prioritizes runtime and parameter efficiency. RSAdapter comprises two key components: the Parallel Adapter and an additional linear transformation layer inserted after each fully connected (FC) layer within the Adapter. This approach not only improves adaptation to pretrained multimodal models but also allows the parameters of the linear transformation layer to be integrated into the preceding FC layers during inference, reducing inference costs.

Preparation

Training

  • for RSVQA-LR dataset
    • Change the default path of image files
python train_lr.py
  • for RSVQA-HR dataset
    • Change the default path of image files
python train_hr.py
  • for RSIVQA dataset
    • Change the default path of image files
    • Since RSIVQA comprises multiple datasets with varying image sizes, we first resize all images to a unified size of 256 × 256 before feeding them into the model. Please resize images before training the model on RSIVQA dataset.
python train_rsi.py

COMPARISON WITH SOTA

TODO

  • Add Inference code

Acknowledgement

The codes are based on transformers. The authors would also like to thank the contributors to the RSVQA and RSIVQA datasets.

About

[TGRS24] The official PyTorch implementation of the paper "RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published