- Learning Transferable Visual Models From Natural Language Supervision Paper Code
- Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision Paper
- Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation Paper
- Better Vision-Language Models with Feature Adapters Paper Code
- A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained Vision-language Model Paper Code
- Open-vocabulary Object Detection via Vision and Language Knowledge Distillation Paper Code
- CLIP for Video Caption Paper