Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Demo Inference Produces Distorted Audio Output #3799

Open
Heshamtr opened this issue Jun 25, 2024 · 0 comments
Open

[Bug] Demo Inference Produces Distorted Audio Output #3799

Heshamtr opened this issue Jun 25, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@Heshamtr
Copy link

Describe the bug

I followed the demo code provided by Coqui to create a simple dataset and fine-tune a model using Gradio. However, when I load the model and perform inference, the output audio is heavily distorted, resembling the sound of a hair shaving machine.

You can listen to the output at the following link: Distorted Audio Output.

Steps to Reproduce:

Create Dataset:

Followed the instructions to create a simple dataset using the demo code.
Fine-Tune Model:

Used the Gradio interface as provided in the demo to fine-tune the model.
Load Model and Inference:

Loaded the fine-tuned model.
Create a simple dataset, fine-tune and performed inference using the Gradio interface with the following setup:

py TTS/TTS/demos/xtts_ft_demo/xtts_demo.py

The model should produce a clear and intelligible speech output corresponding to the input text.

Actual Result:

The output audio is distorted and unintelligible. You can hear the output here: Distorted Audio Output.

Additional Information:

I verified that CUDA and the NVIDIA drivers are correctly installed and operational.
The nvidia-smi command confirms that the GPU is recognized and utilized by the system.
Other models and libraries utilizing CUDA work as expected.
Logs and Error Messages:

No explicit error messages were encountered during the execution. The process completes without any exceptions.

Request:

Could you please provide guidance on how to resolve this issue or if there are any specific configurations required to avoid such distortion in the output?

Thank you for your assistance.

To Reproduce

py TTS/TTS/demos/xtts_ft_demo/xtts_demo.py

Expected behavior

No response

Logs

No response

Environment

- Operating System: Window 11
- Python Version: 3.10.4
- CUDA Version: 11.5
- PyTorch Version: 1.11.0+cu115
- coqui-ai Version: Last Update on github

Additional context

No response

@Heshamtr Heshamtr added the bug Something isn't working label Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant