Skip to content

Latest commit

 

History

History
65 lines (55 loc) · 3.71 KB

lvvit.md

File metadata and controls

65 lines (55 loc) · 3.71 KB

LV ViT

  • Paper:Token Labeling: Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet

  • Origin Repo:zihangJiang/TokenLabeling

  • Code:lvvit.py

  • Evaluate Transforms:

    # backend: pil
    # input_size: 224x224
    transforms = T.Compose([
        T.Resize(248, interpolation='bicubic'),
        T.CenterCrop(224),
        T.ToTensor(),
        T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
    
    # backend: pil
    # input_size: 384x384
    transforms = T.Compose([
        T.Resize(384, interpolation='bicubic'),
        T.CenterCrop(384),
        T.ToTensor(),
        T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
    
    # backend: pil
    # input_size: 448x448
    transforms = T.Compose([
        T.Resize(448, interpolation='bicubic'),
        T.CenterCrop(448),
        T.ToTensor(),
        T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
  • Model Details:

    Model Model Name Params (M) FLOPs (G) Top-1 (%) Top-5 (%) Pretrained Model
    LV-ViT-S lvvit_s 26.2 6.6 83.17 95.87 Download
    LV-ViT-M lvvit_m 55.8 16.0 83.88 96.05 Download
    LV-ViT-S-384 lvvit_s_384 26.3 22.2 84.56 96.39 Download
    LV-ViT-M-384 lvvit_m_384 56.0 42.2 85.34 96.72 Download
    LV-ViT-M-448 lvvit_m_448 56.1 61.0 85.47 96.82 Download
    LV-ViT-L-448 lvvit_l_448 150.5 157.2 86.09 96.85 Download
  • Citation:

    @article{jiang2021token,
    title={Token Labeling: Training a 85.5% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet},
    author={Jiang, Zihang and Hou, Qibin and Yuan, Li and Zhou, Daquan and Jin, Xiaojie and Wang, Anran and Feng, Jiashi},
    journal={arXiv preprint arXiv:2104.10858},
    year={2021}
    }