Skip to content

Official codes of VEnhancer: Generative Space-Time Enhancement for Video Generation

Notifications You must be signed in to change notification settings

Vchitect/VEnhancer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VEnhancer: Generative Space-Time Enhancement
for Video Generation

Peng Gao,  Dahua Lin,  Yu Qiao,  Wanli Ouyang,  Ziwei Liu
The Chinese University of Hong Kong, Shanghai Artificial Intelligence Laboratory, 
S-Lab, Nanyang Technological University 

VEnhancer, a generative space-time enhancement framework that can improve the existing T2V results.

VideoCrafter2 +VEnhancer

📖 For more visual results, go checkout our project page


🔥 Update

  • [2024.07.28] Inference code and pretrained video enhancement model are released.
  • [2024.07.10] This repo is created.

🎬 Overview

The architecture of VEnhancer. It follows ControlNet and copies the architecures and weights of multi-frame encoder and middle block of a pretrained video diffusion model to build a trainable condition network. This video ControlNet accepts low-resolution key frames as well as full frames of noisy latents as inputs. Also, the noise level $\sigma$ regarding noise augmentation and downscaling factor $s$ serve as additional network conditioning apart from timestep $t$ and prompt $c_{text}$. overall_structure

⚙️ Installation

# clone this repo
git clone https://github.com/Vchitect/VEnhancer.git
cd VEnhancer

# create environment
conda create -n venhancer python=3.10
conda activate venhancer
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
pip install -r requirements.txt

Note that ffmpeg command should be enabled. If you have sudo access, then you can install it using the following command:

sudo apt-get update && apt-get install ffmpeg libsm6 libxext6  -y

🧬 Pretrained Models

Model Name Description HuggingFace BaiduNetdisk
venhancer_paper.pth video enhancement model, paper version download download

💫 Inference

  1. Download clip model via open clip, Stable Diffusion's VAE via sd2.1, and VEnhancer model. Then, put these three checkpoints in the VEnhancer/ckpts directory.
  2. run the following command.
  bash run_VEnhancer.sh

BibTeX

If you use our work in your research, please cite our publication:

@article{he2024venhancer,
  title={VEnhancer: Generative Space-Time Enhancement for Video Generation},
  author={He, Jingwen and Xue, Tianfan and Liu, Dongyang and Lin, Xinqi and Gao, Peng and Lin, Dahua and Qiao, Yu and Ouyang, Wanli and Liu, Ziwei},
  journal={arXiv preprint arXiv:2407.07667},
  year={2024}
}

🤗 Acknowledgements

Our codebase builds on modelscope. Thanks the authors for sharing their awesome codebases!

📧 Contact

If you have any questions, please feel free to reach us at hejingwenhejingwen@outlook.com.