Vision-LLM Alignemnt Training (SFT+PPO/DPO)

Vision-LLM-Alignment is a project designed to implement alignment training for visual large language models (LLMs). This includes SFT training, reward model training, and PPO/DPO training. If additional alignment algorithms need to be supported, please raise them in an issue.

Changelog

[2024/07/18] We provide a large-scale vision feedback dataset. It is a combination of the following high-quality vision feedback datasets. The dataset can be found in wangclnlp/vision-feedback-mix-binarized and wangclnlp/vision-feedback-mix-binarized-cleaned.
[2024/07/10] We support the direct loading of a LLaVA model in all training stages, including SFT training, RM training, and PPO/DPO training.
[2024/07/07] We support the direct loading of a LLaVA model during the SFT training phase. You just need to set the model_architecture parameter to "llava" and specify the LLaVA model path with from_checkpoint. Support for this functionality during the DPO, RM training, and PPO junction phases will be introduced soon.

Installation

You can use anaconda/miniconda to install packages needed for this project.

pip install -r requirements.txt

Preparing Models and Datasets

Models

Vision-LLM requires both a vision encoder and a language model. Its architecture is depicted in the figure.

Datasets

We have tentatively implemented all alignment training based on this LLaVA dataset format. Some samples can be found in the data folder.

Training Models

Supervised Fine-tuning (SFT)

bash run_sft.sh

Reward Model Training

bash run_rm_training.sh

Direct Pereference Optimization (DPO)

bash run_dpo_training.sh

Reinforcement Learning from Human Feedback (RLHF)

bash run_ppo_training.sh

Evaluation

bash run_predict.sh

Supported Models

LLM	Model size
LLaMA-2	7B/13B/70B
LLaMA-3	8B/70B

Vision Model
clip-vit-large-patch14
clip-vit-large-patch14-336

Note: Other LLMs with the same architecture as LLaMA-2/3 are also supported. You can also add arbitrary model architectures by modifying this training/utils/model/build_model.py.

Acknowledgement

We commence by utilizing the exceptional codebase provided by DeepSpeed-VisualChat 🌹🌹🌹.

We would like to thank Yifu Huo and Yang Gan for their contributions to this work.

We thank the following papers:

[1] Ouyang, Long, et al. "Training language models to follow instructions with human feedback." Advances in neural information processing systems 35 (2022): 27730-27744.
[2] Rafailov, Rafael, et al. "Direct preference optimization: Your language model is secretly a reward model." Advances in Neural Information Processing Systems 36 (2024).
[3] Liu, Haotian, et al. "Visual instruction tuning." Advances in neural information processing systems 36 (2024).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision-LLM Alignemnt Training (SFT+PPO/DPO)

Changelog

Installation

Preparing Models and Datasets

Models

Datasets

Training Models

Supervised Fine-tuning (SFT)

Reward Model Training

Direct Pereference Optimization (DPO)

Reinforcement Learning from Human Feedback (RLHF)

Evaluation

Supported Models

Acknowledgement

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
chat		chat
data		data
eval		eval
training		training
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_dpo_training.sh		run_dpo_training.sh
run_ppo_training.sh		run_ppo_training.sh
run_pre_training.sh		run_pre_training.sh
run_predict.sh		run_predict.sh
run_rm_training.sh		run_rm_training.sh
run_sft.sh		run_sft.sh

wangclnlp/Vision-LLM-Alignment

Folders and files

Latest commit

History

Repository files navigation

Vision-LLM Alignemnt Training (SFT+PPO/DPO)

Changelog

Installation

Preparing Models and Datasets

Models

Datasets

Training Models

Supervised Fine-tuning (SFT)

Reward Model Training

Direct Pereference Optimization (DPO)

Reinforcement Learning from Human Feedback (RLHF)

Evaluation

Supported Models

Acknowledgement

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages