Name		Name	Last commit message	Last commit date
parent directory ..
clip		clip
det		det
eva		eva
logs		logs
seg		seg
video		video
HF_models_README.md		HF_models_README.md
README.md		README.md

README.md

✝️EVA: An Open Billion-Scale Vision Foundation Model

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

Yuxin Fang^2,1, Wen Wang^3,1, Binhui Xie^4,1, Quan Sun¹, Ledell Wu¹, Xinggang Wang², Tiejun Huang¹, Xinlong Wang¹, Yue Cao¹

¹BAAI, ²HUST, ³ZJU, ⁴BIT

CVPR 2023, 🌟highlight🌟

We launch EVA, a vision-centric foundation model to Explore the limits of Visual representation at scAle using only publicly accessible data and academic resources. EVA is a vanilla ViT pre-trained to reconstruct the masked out image-text aligned vision features (i.e., CLIP features) conditioned on visible image patches. Via this pretext task, we can efficiently scale up EVA to one billion parameters, and sets new records on a broad range of representative vision downstream tasks.

EVA is the first open-sourced billion-scale vision foundation model that achieves state-of-the-art performance on a broad range of downstream tasks.

News

Mar 21, 2023: EVA is selected as a 🌟highlight🌟 at CVPR 2023!
Mar 21, 2023: If you like EVA, you might also like EVA-02, the next-gen EVA.
Feb 28, 2023: EVA is accepted to CVPR 2023!
Jan 31, 2023: Strong visual representations also enable powerful VL foundation models. By leveraging EVA-CLIP, BLIP-2 (paper, code) achieves SoTA performance on various VL tasks!
Dec 12, 2022: EVA and EVA-L model weights are added to the awesome timm library, thanks @rwightman!
Dec 07, 2022: launch EVA-L, the best ViT-L (304M) to date that can reach up to 89.2 top-1 acc on IN-1K (weights & logs) by leveraging vision features from EVA-CLIP.
Nov 25, 2022: release EVA-CLIP zero-shot evaluation results on 35 benchmarks.
Nov 22, 2022: release code & model of object detection and instance segmentation.
Nov 21, 2022: release code & model of video classification, semantic segmentation, EVA-CLIP.
Nov 20, 2022: release code & model of pre-training and image classification.
Nov 18, 2022: release wandb log & statistics of 1.1B EVA-CLIP training.

Get Started

All EVA model checkpoints are now available at 🤗 Hugging Face Models and BAAI ModelHub (EVA & EVA-CLIP). Try them out!

Summary of EVA's performance

image & video classification

		image classification				video classification
model	#param.	IN-1K, e2e ft	IN-1K, linear	IN-1K, zero-shot	12 avg. zero-shot	K400	K600	K700
EVA or EVA-CLIP	1.0B	89.7	86.5	78.5	75.7	89.7	89.8	82.9

object detection & segmentation

		COCO det & ins seg				LVIS det & ins seg		sem seg
model	#param.	det (test)	det (val)	seg (test)	seg (val)	det	seg	COCO-Stuff	ADE20K
EVA	1.0B	64.7	64.5	55.5	55.0	62.2	55.0	53.4	62.3

BibTeX & Citation

@article{EVA,
  title={EVA: Exploring the Limits of Masked Visual Representation Learning at Scale},
  author={Fang, Yuxin and Wang, Wen and Xie, Binhui and Sun, Quan and Wu, Ledell and Wang, Xinggang and Huang, Tiejun and Wang, Xinlong and Cao, Yue},
  journal={arXiv preprint arXiv:2211.07636},
  year={2022}
}

Contact

For help and issues associated with EVA, or reporting a bug, please open a GitHub Issue with label EVA-01. Let's build a better & stronger EVA together :)
We are hiring at all levels at BAAI Vision Team, including full-time researchers, engineers and interns. If you are interested in working with us on foundation model, self-supervised learning and multimodal learning, please contact Yue Cao (caoyue@baai.ac.cn) and Xinlong Wang (wangxinlong@baai.ac.cn).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EVA-01

EVA-01

README.md

✝️EVA: An Open Billion-Scale Vision Foundation Model

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

News

Get Started

Summary of EVA's performance

BibTeX & Citation

Contact

Files

EVA-01

Directory actions

More options

Directory actions

More options

Latest commit

History

EVA-01

Folders and files

parent directory

README.md

✝️EVA: An Open Billion-Scale Vision Foundation Model

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

News

Get Started

Summary of EVA's performance

BibTeX & Citation

Contact