Releases: naver-ai/vidt
Releases · naver-ai/vidt
ViDT+ Optimized
ViDT+ models were trained for 150 epochs and with full proposed components.
ViDT+ models
We trained ViDT+ models for 50 epochs.
ViDT models trained with distillation
We trained ViDT models with distillation (token matching) for 50 epochs
ViDT models trained for 50 and 150 epochs
There are ViDT pre-trained models for 50 and 150 epochs with different model sizes (from nano to base).
We activated auxiliary decoding loss and iterative box refinement.
Swin-nano pre-trained on ImageNet-1K
This is a pre-trained model called Swin-nano. The accuracy was 74.9% when trained for 300 epochs.