-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Hoarseness in Higher-Pitched Female Voices with xtts-v2 after finetune #3774
Comments
I'm experiencing the same thing with 900 hours of Chinese data fine tuning, 40,000 STEP is prone to this. What is your data? Which languages? How many steps? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels. |
Has anyone found a solution to this particular problem, or at least a workaround? |
Can you provide code for fine tune XTTSv-2 please |
Describe the bug
When generating higher-pitched female voices after fine-tuning the xtts-v2 model, there is a noticeable hoarseness, resembling the strain one might experience when trying to reach high musical notes.
abnormal example:
https://mork.ro/NQjFi
normal example:
https://mork.ro/3iZ8Q#
Two voices generated from the same model, using different audio prompts.
To Reproduce
infer
Expected behavior
No response
Logs
No response
Environment
Additional context
No response
The text was updated successfully, but these errors were encountered: