Training-Free_Zero-Shot_Semantic_Segmentation_with_LLM_Refinement

This repository contains official implementation of the paper "Training-Free Zero-Shot Semantic Segmentation with LLM Refinement" (BMVC 2024).

Project Page: https://sky24h.github.io/websites/bmvc2024_training-free-semseg-with-LLM/

Huggingface Demo: https://huggingface.co/spaces/sky24h/Training-Free_Zero-Shot_Semantic_Segmentation_with_LLM_Refinement

Dependencies

Python >= 3.9 (Recommend == 3.11.8)

pip install -r requirements.txt

Usage

1. Download Pretrained Model

All pre-trained models will be downloaded automatically when you run the code. However, you may need authorization to download the Llama3-8b model from Huggingface.

You can use the following command to login to Huggingface, or you can download the model manually to your local machine and modify the "utils/llms_utils.py" file to load the model from the local directory.

huggingface-cli login

2. Inference on Single Image

python inference_single.py --config ./configs/DRAM.yaml --input_path ./sources/DRAM_eg.jpg
python inference_single.py --config ./configs/Cityscapes.yaml --input_path ./sources/Cityscapes_eg.jpg

3. Inference on dataset

See the configuration files in the "configs" directory for more details on the dataset and model settings.

CUDA_VISIBLE_DEVICES=0 python inference_dataset.py --config ./configs/VOC2012.yaml --reset --draw_bbox --debug
CUDA_VISIBLE_DEVICES=0 python inference_dataset.py --config ./configs/COCO-81.yaml --reset --draw_bbox --debug

Flag	Description
--reset	Removes the previous results
--draw_bbox	Visualizes the bounding box of the detected objects
--debug	Runs only the first 5% of the dataset
--use_lower_vram	use this flag to reduce the memory requirement of the model

Model Variant	GPU Memory Requirement
LLama-3-8B w/o use_lower_vram	30GB
LLama-3-8B w/ use_lower_vram	24GB
OpenAI API w/o use_lower_vram	16GB
OpenAI API w/ use_lower_vram	12GB

Citation

If you find this work useful, please consider citing the following paper:

@inproceedings{Huang2024SemSegLLM,
  author = {Huang, Yuantian and Iizuka, Satoshi and Fukui, Kazuhiro},
  booktitle = {The British Machine Vision Conference (BMVC) 2024},
  title = {Training-Free Zero-Shot Semantic Segmentation with LLM Refinement},
  year = {2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
pretrained-models		pretrained-models
sources		sources
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluation.py		evaluation.py
inference_dataset.py		inference_dataset.py
inference_single.py		inference_single.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Training-Free_Zero-Shot_Semantic_Segmentation_with_LLM_Refinement

Dependencies

Usage

1. Download Pretrained Model

2. Inference on Single Image

3. Inference on dataset

Citation

About

Releases

Packages

Languages

License

sky24h/Training-Free_Zero-Shot_Semantic_Segmentation_with_LLM_Refinement

Folders and files

Latest commit

History

Repository files navigation

Training-Free_Zero-Shot_Semantic_Segmentation_with_LLM_Refinement

Dependencies

Usage

1. Download Pretrained Model

2. Inference on Single Image

3. Inference on dataset

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages