Skip to content

Latest commit

 

History

History
32 lines (16 loc) · 1.03 KB

README.md

File metadata and controls

32 lines (16 loc) · 1.03 KB

SPT-LSA ViT : training ViT for small size Datasets

Here is a non official implementation, in Pytorch, of the paper Vision Transformer for Small-Size Datasets.

The configuration has been trained on CIFAR-10 and shows interesting results.

The main components of the papers are :

The ViT architecture :

image

The Shifted Patch Tokenizer (for increasing the locality inductive bias) :

image

The Locality Self-Attention :

image

These components can be found in the models.py

Todo

  • Use register_buffer for the -inf mask in the Locality Self-Attention
  • Use warmup
  • Visualize Attention layers
  • Track scaling coefficient in attention using TensorBoard