GitHub - lightbooster/TP-GST-BERT-Tacotron2: TP-GST-BERT Tacotron2 is a voice synthesis model, based on Tacotron2 GST that can predict Style Embedding only on text, using BERT sentence embedding

TP-GST-BERT-Tacotron2

This is a realization of the model proposed by SberDevices team and extended with quicker TP-GST module by me
The training process has been carried out on a russian language dataset

Model contains:

Tacotron2 Encoder + Decoder
Global Style Tokens module
3 Text-predicting style embedding models
BERT model

Pre-requisites

NVIDIA GPU + CUDA cuDNN

Setup

Clone this repo: git clone https://github.com/lightbooster/TP-GST-BERT-Tacotron2.git
CD into this repo: TP-GST-BERT-Tacotron2
Initialize submodule: git submodule init; git submodule update
Install [PyTorch]
Install [Apex]
Install python requirements or build docker image
- Install python requirements: pip install -r requirements.txt
  NOTE: elaborated example of SetUp in notebook demo.ipynb

Prepare BERT

Download BERT checkpoint (I used RuBERT from deeppavlov.ai)
Move BERT checkpoint, config and vocabulary into /bert folder or setup related paths in hparams.py
Modify BERT hyper parameters in hparams.py if those are needed

Training

Update the filelists inside the filelists folder to point to your data
python train.py --output_directory=outdir --log_directory=logdir
(OPTIONAL) tensorboard --logdir=outdir/logdir

Training using a pre-trained model

Training using a pre-trained model can lead to faster convergence
By default, the speaker embedding layer is [ignored]

Download my pretrained model checkpoint on a russian language dataset NOTE: checkpoint doesn't contain BERT model weights, use ceparate checkpoint for it
python train.py --output_directory=outdir --log_directory=logdir -c {PATH_TO_CHECKPOINT} --warm_start

Multi-GPU (distributed) and Automatic Mixed Precision Training

python -m multiproc train.py --output_directory=outdir --log_directory=logdir --hparams=distributed_run=True,fp16_run=True

Training and Inference demo

M-AILABS data preprocessing, train configuration and inference demos are represented in the notebook demo.ipynb

Related repos

WaveGlow Faster than real time Flow-based Generative Network for Speech Synthesis.

References

Nvidia's Mellotron (Tacotron2 + GST) is a basement of work
Stanton, D., Wang, Y., & Skerry-Ryan, R. J. (2018, August 4) Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis https://arxiv.org/abs/1808.01410
Skerry-Ryan, RJ, Battenberg, Eric, Xiao, Ying, Wang, Yuxuan, Stanton, Daisy, Shor, Joel, Weiss, Ron J., Clark, Rob, and Saurous, Rif A. Towards end-to-end prosody transfer for expressive speech synthesis with Tacotron https://arxiv.org/abs/1803.09047
Sber Devices Synthesis of speech of virtual assistants Salute https://habr.com/ru/company/sberdevices/blog/548812/

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
bert		bert
data		data
filelists		filelists
images		images
text		text
waveglow @ 2fd4e63		waveglow @ 2fd4e63
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
audio_processing.py		audio_processing.py
data_utils.py		data_utils.py
demo.ipynb		demo.ipynb
distributed.py		distributed.py
fp16_optimizer.py		fp16_optimizer.py
hparams.py		hparams.py
layers.py		layers.py
logger.py		logger.py
loss_function.py		loss_function.py
loss_scaler.py		loss_scaler.py
mellotron_utils.py		mellotron_utils.py
model.py		model.py
modules.py		modules.py
multiproc.py		multiproc.py
plotting_utils.py		plotting_utils.py
requirements.txt		requirements.txt
stft.py		stft.py
tp_gst.py		tp_gst.py
train.py		train.py
utils.py		utils.py
yin.py		yin.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TP-GST-BERT-Tacotron2

Model contains:

Pre-requisites

Setup

Prepare BERT

Training

Training using a pre-trained model

Multi-GPU (distributed) and Automatic Mixed Precision Training

Training and Inference demo

Related repos

References

About

Releases

Packages

Languages

License

lightbooster/TP-GST-BERT-Tacotron2

Folders and files

Latest commit

History

Repository files navigation

TP-GST-BERT-Tacotron2

Model contains:

Pre-requisites

Setup

Prepare BERT

Training

Training using a pre-trained model

Multi-GPU (distributed) and Automatic Mixed Precision Training

Training and Inference demo

Related repos

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages