Skip to content

Fengxiang23/SelectiveMAE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Scaling Efficient Masked Autoencoder Learning on Large Remote Sensing Dataset

Fengxiang Wang1    Hongzhen Wang2,‡    Di Wang3    Zonghao Guo4
Zhenyu Zhong5    Long Lan1,‡   Jing Zhang6  Zhiyuan Liu2    Maosong Sun2   

1 National University of Defense Technology    2Tsinghua University     3Wuhan University  
4University of Chinese Academic of Sciences   5Nankai University   6The University of Sydney

Intruduction

  • RS-4M: A large-scale remote sensing dataset. This dataset, comprising 4 million optical images, is designed to fully leverage the representation learning capabilities of MIM methods in RS applications, distinguished by its diverse scene details.
  • SelectiveMAE: A novel and efficient MIM method tailored for remote sensing images. This method incorporates a new PSTS module, which significantly accelerates convergence and enhances representation learning compared to the original MIM approach.

Todo List

  • Initial release of checkpoint of SelectiveMAE. 🚀
  • Codes and configs for downstream tasks of SelectiveMAE, Scene Classification. 🚀
  • Codes and configs for downstream tasks of SelectiveMAE, Object Detection and Semantic Segmentation.
  • Pretraining codes and configs for SelectiveMAE will be released.
  • RS-4M dataset will be released.

Updates

  • [2024.06] - The training logs of the SelectiveMAE have been released.

Outline

RS-4M

RS-4M dataset contains about 4 million high-quality remote sensing optical images, which is four times larger than previous representative remote sensing datasets.

Examples of RS-4M

Experiments on RS-4M

RS-4M offers a significantly larger and more diverse image set compared to previous datasets. To evaluate its effectiveness, we pre-train a ViT-Base model using the vanilla MAE method. For comparison, we use the MillionAID dataset, maintaining an equal number of data points during training: 800 epochs for MillionAID's 1 million images and 200 epochs for our RS-4M dataset.

Dataset Pretrained model Images Number Epoch Sence Classification Sence Classification Object Detection Object Detection Semantic Segmentation Semantic Segmentation
AID RESISC-45 DIOR DIOR-R LoveDA SpaceNetv1
OA (TR=20%/50%) OA (TR=20%/50%) mAP50 mAP50 mIoU mF1
MillionAID Weights 1 million 800 94.92/97.38 89.20/93.60 71.80 62.33 51.24 79.24
RS-4M Weights 2 million 400 96.64/98.10 91.80/94.31 73.90 65.95 52.86 79.37
RS-4M Weights 3 million 267 96.67/98.18 92.24/94.41 75.40 67.07 52.39 79.37
RS-4M Weights 4 million 200 96.10/98.03 92.38/94.30 74.70 66.26 52.75 79.23
RS-4M Weights 4 million 800 96.88/98.22 92.44/94.43 75.40 67.35 52.80 79.41

SelectiveMAE

⚙️ Installation

For details related to installation, kindly refer to INSTALL.md.

🚙 Pretraining

To learn more usage about the pretraining codes, kindly refer to PRETRAIN.md.

🚀 Results on downstream tasks

Model Publication Backbone Sence Classification Sence Classification Object Detection Object Detection Semantic Segmentation Semantic Segmentation
AID RESISC-45 DIOR DIOR-R LoveDA SpaceNetv1
OA (TR=20%/50%) OA (TR=20%/50%) mAP50 mAP50 mIoU mF1
SeCo ICCV'21 ResNet-50 93.47/95.99 89.64/92.91 - - 43.63 77.09
GASSL ICCV'21 ResNet-50 93.55/95.92 90.86/93.06 67.40 65.65 48.76 78.51
TOV JSTARS'23 ResNet-50 95.16/97.09 90.97/93.79 70.16 66.33 49.70 -
CACo CVPR'23 ResNet-50 90.88/95.05 88.28/91.94 66.91 64.10 48.89 77.94
SatMAE NIPS'22 ViT-L 95.02/96.94 91.72/94.10 70.89 65.66 - 78.07
ScaleMAE ICCV'23 ViT-L 96.44/97.58 92.63/95.04 73.81 66.47 - -
SSL4EO GRSM'23 ViT-S 91.06/94.74 87.60/91.27 64.82 61.23 - -
RingMo TGRS'22 Swin-B 96.90/98.34 94.25/95.67 75.90 - - -
SatLas ICCV'23 Swin-B 94.96/97.38 92.16/94.70 74.10 67.59 - -
GFM ICCV'23 Swin-B 95.47/97.09 92.73/94.64 72.84 67.67 - -
RVSA TGRS'23 ViT-B+RVSA 97.03/98.50 93.93/95.69 75.80 68.06 51.95 -
SelectiveMAE - ViT-B 96.78/98.12 93.35/94.58 75.70 67.78 53.05 79.50
SelectiveMAE - ViT-L 97.25/98.48 94.57/95.77 77.80 70.31 54.31 79.46

License

This work is under the Apache License Version 2.0, while some specific operations in this codebase might be with other licenses. Please refer to LICENSE.md for a more careful check, if you are using our code for commercial matters.

Acknowledgements

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published