Skip to content

Latest commit

 

History

History
17 lines (11 loc) · 624 Bytes

File metadata and controls

17 lines (11 loc) · 624 Bytes

EE340-Project1-Vision-Transformer

Introduction

SUSTech EE340 Project 1: Mnist classification.

I implement a simple Vision Transformer (ViT) model with Pytorch.

This model was proposed by Dosovitskiy et al. in the paper "An image is worth 16x16 words: Transformers for image recognition at scale" (2020).

Usage

python main.py

References

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.