Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language

Yicheng Chen, Xiangtai Li, Yining Li, Yanhong Zeng, Jianzong Wu, Xiangyu Zhao, Kai Chen

Updates

[2024/06] Our paper Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language is released.

Introduction

Auto Cherry Picker is a innovative framework designed to synthesize training samples for both perception and multi-modal reasoning tasks from a simple object list in natural language. It employs a nowly designed metric, CLIS, to ensure the quality of the synthetic data.

Main Results

Long-tailed Instance Segmentation Benchmark

Method	Backbone	$AP_r^{mask}$	$AP^{mask}$
Mask R-CNN	ResNet-50	9.3	21.7
Mask R-CNN w. ACP	ResNet-50	14.5(+5.2)	22.8(+1.1)
CenterNet2 w. Copy-Paste	Swin-B	29.3	39.3
CenterNet2 w. ACP	Swin-B	30.7(+1.4)	39.6(+0.3)

Open-vocabulary Object Detection Benchmark

Dataset	Method	Backbone	$AP_{novel}^{box}$	$AP^{box}$
LVIS	Grounding-DINO	Swin-T	31.7	48.7
LVIS	Grounding-DINO w. ACP	Swin-T	33.0(+1.3)	49.2
COCO	Grounding-DINO	Swin-T	60.4	57.1
COCO	Grounding-DINO w. ACP	Swin-T	60.8(+0.4)	56.9

Multi-modal Image-based Benchmarks

Method	LLM Backbone	MME	GQA
LLaVA-1.5	Vicuna-7B	1434.4	58.9
LLaVA-1.5	Vicuna-13B	1438.3	60.7
LLaVA-1.5	LLama-3-8B	1445.3	60.1
LLaVA-1.5 w. ACP	Vicuna-7B	1514.5(+80.1)	59.3(+0.4)

Installation

Requirements

Python 3.10

Pytorch 2.3.0

Conda Environment Setup

pip install -r requirements.txt

Prepare Scene Graph Generator

Download Qwen1.5-14B-Chat

git clone https://huggingface.co/Qwen/Qwen1.5-14B-Chat

You can try other LLMs as Scene Graph Generator, and add it in the config/model_config.json.

Prepare Image Generator

Step 1: Download InstanceDiffusion

git clone https://github.com/frank-xwang/InstanceDiffusion.git

Step 2: Download model weights

Please download the pretrained InstanceDiffusion from Hugging Face or Google Drive and SD1.5, place them under InstanceDiffusion/pretrained folder.

Then, create a soft link under ACP folder.

ln -s InstanceDiffusion/pretrained ./pretrained

Step 3: Download CLIP

git clone https://huggingface.co/openai/clip-vit-large-patch14

Step 4: Download SDXL Refiner (Optional)

git clone https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0

To disable SDXL, you can set args.cascade_strength at infer_image.py to 0.

Prepare Image Filter

Please download Qwen-VL-Chat

git clone https://huggingface.co/Qwen/Qwen-VL-Chat

Prepare Layout Filter

Please construct example pool for CLIS-L.

Download sim_map.json and relations_one_to_one.json under config/eval/

Prepare Segmentor

Download SAM model weights at Github

mkdir sam
cd sam
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

Quick Start

python inference_single_data.py

You can custom object list at inputs/demo.json. The generated images are under images/ and the synthesis training sample is under syn_data/.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
CLIS		CLIS
config		config
inputs		inputs
utils		utils
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
infer_image.py		infer_image.py
inference_single_data.py		inference_single_data.py
llm.py		llm.py
prompt.py		prompt.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language

Updates

Introduction

Main Results

Long-tailed Instance Segmentation Benchmark

Open-vocabulary Object Detection Benchmark

Multi-modal Image-based Benchmarks

Installation

Requirements

Conda Environment Setup

Prepare Scene Graph Generator

Prepare Image Generator

Prepare Image Filter

Prepare Layout Filter

Prepare Segmentor

Quick Start

About

Releases

Packages

Languages

License

yichengchen24/ACP

Folders and files

Latest commit

History

Repository files navigation

Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language

Updates

Introduction

Main Results

Long-tailed Instance Segmentation Benchmark

Open-vocabulary Object Detection Benchmark

Multi-modal Image-based Benchmarks

Installation

Requirements

Conda Environment Setup

Prepare Scene Graph Generator

Prepare Image Generator

Prepare Image Filter

Prepare Layout Filter

Prepare Segmentor

Quick Start

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages