Skip to content

The officical code of 'ChangeViT: Unleashing Plain Vision Transformers for Change Detection'.

License

Notifications You must be signed in to change notification settings

zhuduowang/ChangeViT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ChangeViT

Codes and models for ChangeViT: Unleashing Plain Vision Transformers for Change Detection .

Duowang Zhu, Xiaohu Huang, Haiyan Huang, Zhenfeng Shao, Qimin Cheng

[paper]

Update

  • [2024/6/24] All the code has been released, including training and inference. 😊
  • [2024/6/19] The core component of this paper has been released, including detail-capture, and feature injector.
  • [2024/6/18] The training code will be publicly available at about 2024/7/5.

Abstract

In this paper, our study uncovers ViTs' unique advantage in discerning large-scale changes, a capability where CNNs fall short. Capitalizing on this insight, we introduce ChangeViT, a framework that adopts a plain ViT backbone to enhance the performance of large-scale changes. This framework is supplemented by a detail-capture module that generates detailed spatial features and a feature injector that efficiently integrates fine-grained spatial information into high-level semantic learning. The feature integration ensures that ChangeViT excels in both detecting large-scale changes and capturing fine-grained details, providing comprehensive change detection across diverse scales. Without bells and whistles, ChangeViT achieves state-of-the-art performance on three popular high-resolution datasets (i.e., LEVIR-CD, WHU-CD, and CLCD) and one low-resolution dataset (i.e., OSCD), which underscores the unleashed potential of plain ViTs for change detection. Furthermore, thorough quantitative and qualitative analyses validate the efficacy of the introduced modules, solidifying the effectiveness of our approach.

Framework


Figure 1. Overview of the proposed $\textbf{ChangeViT}$. bi-temporal images $I_{1}$ and $I_{2}$ are firstly fed into shared ViT to extract high-level semantic features and detail-capture module to extract low-level detailed information. Subsequently, a feature injector is introduced to inject the low-level details into high-level features. Finally, a decoder is utilized to predict changed probability maps.

Performance

Table 1. Performance comparison of different change detection methods on LEVIR-CD, WHU-CD, and CLCD datasets, respectively. The best results are highlighted in bold and the second best results are underlined. All results of the three evaluation metrics are described as percentages (%).
Method #Params(M) FLOPs(G) LEVIR-CD WHU-CD CLCD
F1 IoU OA F1 IoU OA F1 IoU OA
DTCDSCN $41.07$ $20.44$ $87.43$ $77.67$ $98.75$ $79.92$ $66.56$ $98.05$ $57.47$ $40.81$ $94.59$
SNUNet $12.04$ $54.82$ $88.16$ $78.83$ $98.82$ $83.22$ $71.26$ $98.44$ $60.82$ $43.63$ $94.90$
ChangeFormer $41.03$ $202.79$ $90.40$ $82.48$ $99.04$ $87.39$ $77.61$ $99.11$ $61.31$ $44.29$ $94.98$
BIT $\textbf{3.55}$ $\textbf{10.63}$ $89.31$ $80.68$ $98.92$ $83.98$ $72.39$ $98.52$ $59.93$ $42.12$ $94.77$
ICIFNet $23.82$ $25.36$ $89.96$ $81.75$ $98.99$ $88.32$ $79.24$ $98.96$ $68.66$ $52.27$ $95.77$
DMINet $\underline{6.24}$ $\underline{14.42}$ $90.71$ $82.99$ $99.07$ $88.69$ $79.68$ $98.97$ $67.24$ $50.65$ $95.21$
GASNet $23.59$ $23.52$ $90.52$ $83.48$ $99.07$ $91.75$ $84.76$ $99.34$ $63.84$ $46.89$ $94.01$
AMTNet $24.67$ $21.56$ $90.76$ $83.08$ $98.96$ $92.27$ $85.64$ $99.32$ $75.10$ $60.13$ $96.45$
EATDer $6.61$ $23.43$ $91.20$ $83.80$ $98.75$ $90.01$ $81.97$ $98.58$ $72.01$ $56.19$ $96.11$
ChangeViT-T (Ours) $11.68$ $27.15$ $\underline{91.81}$ $\underline{84.86}$ $\underline{99.17}$ $\underline{94.53}$ $\underline{89.63}$ $\underline{99.57}$ $\underline{77.31}$ $\underline{63.01}$ $\underline{96.67}$
ChangeViT-S (Ours) $32.13$ $38.80$ $\textbf{91.98}$ $\textbf{85.16}$ $\textbf{99.19}$ $\textbf{94.84}$ $\textbf{90.18}$ $\textbf{99.59}$ $\textbf{77.57}$ $\textbf{63.36}$ $\textbf{96.79}$
Table 2. Performance comparison of different change detection methods on the OSCD dataset. The best results are highlighted in bold and the second best results are underlined. All results of the three evaluation metrics are described as percentages (%).
Method OSCD
F1 IoU OA
DTCDSCN $36.13$ $22.05$ $94.50$
SNUNet $27.02$ $15.62$ $93.81$
ChangeFormer $38.22$ $23.62$ $94.53$
BIT $29.58$ $17.36$ $90.15$
ICIFNet $23.03$ $13.02$ $94.61$
DMINet $42.23$ $26.76$ $95.00$
GASNet $10.71$ $5.66$ $91.52$
AMTNet $10.25$ $5.40$ $94.29$
EATDer $54.23$ $36.98$ $93.85$
ChangeViT-T (Ours) $\underline{55.13}$ $\underline{38.06}$ $\underline{95.01}$
ChangeViT-S (Ours) $\textbf{55.51}$ $\textbf{38.42}$ $\textbf{95.05}$

Usage

Data Preparation

  • Download the LEVIR-CD, WHU-CD, CLCD, and OSCD datasets. (You can also download the processed WHU-CD dataset from here)

  • Crop each image in the dataset into 256x256 patches.

  • Prepare the dataset into the following structure and set its path in the config file.

    ├─Train
        ├─A          jpg/png
        ├─B          jpg/png
        └─label      jpg/png
    ├─Val
        ├─A 
        ├─B
        └─label
    ├─Test
        ├─A
        ├─B
        └─label
    

Checkpoint

  • Download the pre-weights ViT-T, and ViT-S, then put them into checkpoints folder.

  • Pre-trained models will come soon.

Dependency

pip install -r requirements.txt

Training

python main.py --file_root LEVIR --max_steps 80000 --model_type small --batch_size 16 --lr 2e-4 --gpu_id 0

Inference

python eval.py --file_root LEVIR --max_steps 80000 --model_type small --batch_size 16 --lr 2e-4 --gpu_id 0

License

ChangeViT is released under the CC BY-NC-SA 4.0 license.

Acknowledgement

This repository is built upon DINOv2 and A2Net. Thanks for those well-organized codebases.

Citation

@article{zhu2024changevit,
  title={ChangeViT: Unleashing Plain Vision Transformers for Change Detection},
  author={Zhu, Duowang and Huang, Xiaohu and Huang, Haiyan and Shao, Zhenfeng and Cheng, Qimin},
  journal={arXiv preprint arXiv:2406.12847},
  year={2024}
}

About

The officical code of 'ChangeViT: Unleashing Plain Vision Transformers for Change Detection'.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages