SPT-LSA ViT : training ViT for small size Datasets

Here is a non official implementation, in Pytorch, of the paper Vision Transformer for Small-Size Datasets.

The configuration has been trained on CIFAR-10 and shows interesting results.

The main components of the papers are :

The ViT architecture :

The Shifted Patch Tokenizer (for increasing the locality inductive bias) :

The Locality Self-Attention :

These components can be found in the models.py

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
__pycache__		__pycache__
.gitignore		.gitignore
README.md		README.md
config.py		config.py
main.py		main.py
model.py		model.py
requirements.txt		requirements.txt
vit.ipynb		vit.ipynb

Provide feedback