Learning Imbalanced Data with Vision Transformers

Zhengzhuo Xu, Ruikang Liu, Shuo Yang, Zenghao Chai and Chun Yuan

This repository is the official PyTorch implementation of the paper LiVT in CVPR 2023.

Environments

python == 3.7
pytorch >= 1.7.0
torchvision >= 0.8.1
timm == 0.3.2
tensorboardX >= 2.1

We recommand to install PyTorch 1.7.0+, torchvision 0.8.1+ and pytorch-image-models 0.3.2.
If your PyTorch is 1.8.1+, a fix is needed to work with timm.
See requirements.txt for detailed requirements. You don't have to be in strict agreement with it, just for reference.

Data preparation

We adopt torchvision.datasets.ImageFolder to build our dataloaders. Hence, we resort all datasets (ImageNet-LT, iNat18, Places-LT, CIFAR) as follows:

/path/to/ImageNet-LT/
    train/
        class1/
            img1.jpeg
        class2/
            img2.jpeg
    val/
        class1/
            img3.jpeg
        class2/
            img4.jpeg

You can follow the prepare.py to construct your dataset.

The detailed information of these datasets are shown as follows:

Usage

Please set the DATA_PATH and WORK_PATH in util.trainer.py Line 6-7.
Typically, make sure 4 or 8 GPUs and >12GB per GPU Memory are available.
Keep the settings consistent with the follows.

You can see all args in Class Trainer in util/trainer.py.

Specially, for different stage, the commands are:

# MGP stage
python script/pretrain.py
# BFT stage
python script/finetune.py
# evaluate stage
python script/evaluate.py

Results and Models

Balanced Finetuned Models and Masked Generative Pretrained Models.

Dataset	Resolution	Many	Med.	Few	Acc	args	log	ckpt	MGP ckpt
ImageNet-LT	224*224	73.6	56.4	41.0	60.9	download	download	download	Res_224
ImageNet-LT	384*384	76.4	59.7	42.7	63.8	download	download	download	Res_224
iNat18	224*224	78.9	76.5	74.8	76.1	download	download	download	Res_128
iNat18	384*384	83.2	81.5	79.7	81.0	download	download	download	Res_128

Citation

If you find our idea or code inspiring, please cite our paper:

@inproceedings{LiVT,
  title={Learning Imbalanced Data with Vision Transformers},
  author={Xu, Zhengzhuo and Liu, Ruikang and Yang, Shuo and Chai, Zenghao and Yuan, Chun},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2023}
}

This code is partially based on Prior-LT, if you use our code, please also cite：

@inproceedings{PriorLT,
  title={Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective},
  author={Xu, Zhengzhuo and Chai, Zenghao and Yuan, Chun},
  booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
  year={2021}
}

Acknowledgements

This project is highly based on DeiT and MAE.

The CIFAR code is based on LDAM and Prior-LT.

The loss implementations are based on CB, LDAM, LADE, PriorLT and MiSLAS.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
exp		exp
models		models
script		script
util		util
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
engine_finetune.py		engine_finetune.py
engine_pretrain.py		engine_pretrain.py
main_finetune.py		main_finetune.py
main_pretrain.py		main_pretrain.py
prepare.py		prepare.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning Imbalanced Data with Vision Transformers

Environments

Data preparation

Usage

Results and Models

Citation

Acknowledgements

About

Releases

Packages

Languages

License

XuZhengzhuo/LiVT

Folders and files

Latest commit

History

Repository files navigation

Learning Imbalanced Data with Vision Transformers

Environments

Data preparation

Usage

Results and Models

Citation

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages