Cluster-Adapter: Tuning Vision-Language Models with Multiple Prototypes Clustering

Official implementation of Cluster-Adapter: Tuning Vision-Language Models with Multiple Prototypes Clustering.

Abstract

Benefiting from advances in large-scale pre-training, foundation models, have demonstrated remarkable capability in the fields of natural language processing, computer vision, among others. However, to achieve expert-level performance in specific applications, such models often need to be fine-tuned with domain-specific knowledge. In this paper, we focus on enabling vision-language models to unleash more potential for visual understanding tasks under few-shot tuning. Specifically, we propose a novel adapter, dubbed as ClusterAdapter, which is based on trainable multiple prototypes clustering algorithm, for tuning the CLIP model. It can not only alleviate the concern of catastrophic forgetting of foundation models by introducing anchors to inherit common knowledge, but also improve the utilization efficiency of few annotated samples via bringing in clustering and domain priors, thereby improving the performance of few-shot tuning. We have conducted extensive experiments on 11 common classification benchmarks. The results show our method significantly surpasses original CLIP and achieves state-of-the-art(SOTA) performance under all benchmarks and settings. For example, under the 16-shot setting, our method exhibits a remarkable improvement over the original CLIP by 19.6% and also surpasses TIP-Adapter, GraphAdapter by 2.7% and 2.2% respectively, in terms of average accuracy across the 11 benchmarks. Code will be made publicly available.

Requirements

Installation

Create a conda environment and install dependencies:

git clone https://github.com/uyzhang/Cluster-Adapter.git
cd Cluster-Adapter

conda create -n cluster_adapter python=3.7
conda activate cluster_adapter

pip install -r requirements.txt

# Install the according versions of torch and torchvision
conda install pytorch torchvision cudatoolkit

Dataset

Follow DATASET.md to install the datasets referring to Tip-Adapter.

Get Started

Running

CUDA_VISIBLE_DEVICES=0 python main.py --config configs/stanford_cars.yaml

Acknowledgement

This repo benefits from CLIP, CoOp, CLIP-Adapter and Tip-Adapter.. Thanks for their wonderful works.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
clip		clip
configs		configs
datasets		datasets
gpt3_prompts		gpt3_prompts
.gitignore		.gitignore
DATASET.md		DATASET.md
README.md		README.md
main.py		main.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cluster-Adapter: Tuning Vision-Language Models with Multiple Prototypes Clustering

Abstract

Requirements

Installation

Dataset

Get Started

Running

Acknowledgement

Citation

About

Releases

Packages

Languages

uyzhang/Cluster-Adapter

Folders and files

Latest commit

History

Repository files navigation

Cluster-Adapter: Tuning Vision-Language Models with Multiple Prototypes Clustering

Abstract

Requirements

Installation

Dataset

Get Started

Running

Acknowledgement

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages