New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Input video instead of image for inference. #45

Open

loic-combis opened this issue Feb 22, 2024 · 0 comments

loic-combis commented Feb 22, 2024

Hi there,

I'm trying to test style avatar to correct the lip sync generation from Wav2Lip.

(I currently have an issue with the preprocessing in Faceverse LizhenWangT/FaceVerse#36)

But regardless, we aim to do the following:

(1) Train Style avatar on the original video/face.
(2) Run Wav2Lip on the original video with a new audio, to get a new video with but poor quality lip sync.
Question for this step: Can we use the fine-tuned style avatar model (from step 1), to correct the lip movement on the video generated step 2?

Note that we try to avoid inputing a single image and recreate the whole speech on the avatar, and rather trying to simply correct the video.

Thanks for your help and your work!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment