Zhengzhuo Xu, Ruikang Liu, Shuo Yang, Zenghao Chai and Chun Yuan
This repository is the official PyTorch implementation of the paper LiVT in CVPR 2023.
python == 3.7
pytorch >= 1.7.0
torchvision >= 0.8.1
timm == 0.3.2
tensorboardX >= 2.1
- We recommand to install
PyTorch 1.7.0+
,torchvision 0.8.1+
andpytorch-image-models 0.3.2
. - If your PyTorch is 1.8.1+, a fix is needed to work with timm.
- See
requirements.txt
for detailed requirements. You don't have to be in strict agreement with it, just for reference.
We adopt torchvision.datasets.ImageFolder
to build our dataloaders. Hence, we resort all datasets (ImageNet-LT, iNat18, Places-LT, CIFAR) as follows:
/path/to/ImageNet-LT/
train/
class1/
img1.jpeg
class2/
img2.jpeg
val/
class1/
img3.jpeg
class2/
img4.jpeg
You can follow the prepare.py
to construct your dataset.
The detailed information of these datasets are shown as follows:
-
Please set the DATA_PATH and WORK_PATH in
util.trainer.py
Line 6-7. -
Typically, make sure 4 or 8 GPUs and >12GB per GPU Memory are available.
-
Keep the settings consistent with the follows.
You can see all args in Class Trainer
in util/trainer.py
.
Specially, for different stage, the commands are:
# MGP stage
python script/pretrain.py
# BFT stage
python script/finetune.py
# evaluate stage
python script/evaluate.py
Balanced Finetuned Models and Masked Generative Pretrained Models.
Dataset | Resolution | Many | Med. | Few | Acc | args | log | ckpt | MGP ckpt |
---|---|---|---|---|---|---|---|---|---|
ImageNet-LT | 224*224 | 73.6 | 56.4 | 41.0 | 60.9 | download | download | download | Res_224 |
ImageNet-LT | 384*384 | 76.4 | 59.7 | 42.7 | 63.8 | download | download | download | |
iNat18 | 224*224 | 78.9 | 76.5 | 74.8 | 76.1 | download | download | download | Res_128 |
iNat18 | 384*384 | 83.2 | 81.5 | 79.7 | 81.0 | download | download | download |
If you find our idea or code inspiring, please cite our paper:
@inproceedings{LiVT,
title={Learning Imbalanced Data with Vision Transformers},
author={Xu, Zhengzhuo and Liu, Ruikang and Yang, Shuo and Chai, Zenghao and Yuan, Chun},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2023}
}
This code is partially based on Prior-LT, if you use our code, please also cite:
@inproceedings{PriorLT,
title={Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective},
author={Xu, Zhengzhuo and Chai, Zenghao and Yuan, Chun},
booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
year={2021}
}
This project is highly based on DeiT and MAE.
The CIFAR code is based on LDAM and Prior-LT.
The loss implementations are based on CB, LDAM, LADE, PriorLT and MiSLAS.