Skip to content

LoMaR (Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction)

License

Notifications You must be signed in to change notification settings

junchen14/LoMaR

Repository files navigation

LoMaR (Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction)

This is a PyTorch/GPU implementation of the paper Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction:

  • This repo is a modification on the MAE. Installation and preparation follow that repo.

  • This repo is based on timm==0.3.2, for which a fix is needed to work with PyTorch 1.8.1+.

  • The relative position encoding is modeled by following iRPE. To enable the iRPE with CUDA supported:

cd rpe_ops/
python setup.py install --user

Main Results on ImageNet-1K

Backbones Method Pretrain Epochs Pretrained Weights Pretrain Logs Finetune Logs
ViT/B-16 LoMaR 1600 download download download

Pre-training

Pretrain the model:

python -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 \
--master_addr=127.0.0.1 --master_port=29517 main_pretrain_lomar.py \
    --batch_size 256 \
    --accum_iter 4 \
    --output_dir ${LOG_DIR} \
    --log_dir ${LOG_DIR} \
    --model mae_vit_base_patch16 \
    --norm_pix_loss \
    --distributed \
    --epochs 400 \
    --warmup_epochs 20 \
    --blr 1.5e-4 --weight_decay 0.05 \
    --window_size 7 \
    --num_window 4 \
    --mask_ratio 0.8 \
    --data_path ${IMAGENET_DIR}

Fine-tuning

Finetune the model:

python -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 \
--master_addr=127.0.0.1 --master_port=29510 main_finetune_lomar.py \
    --batch_size 256 \
    --accum_iter 1 \
    --model vit_base_patch16 \
    --finetune ${PRETRAIN_CHKPT} \
    --epochs 100 \
    --log_dir ${LOG_DIR} \
    --blr 5e-4 --layer_decay 0.65 \
    --weight_decay 0.05 --drop_path 0.1 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
    --dist_eval --data_path ${IMAGENET_DIR}

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

Citation

@article{chen2022efficient,
  title={Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction},
  author={Chen, Jun and Hu, Ming and Li, Boyang and Elhoseiny, Mohamed},
  journal={arXiv preprint arXiv:2206.00790},
  year={2022}
}

About

LoMaR (Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction)

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published