Repeating something you say in a different voice
My project for Insight AI. A three week project to explore applications of Style Transfer to speech data.
Libraries: tensorflow
, numpy
, matplotlib
Reservoir sampling models can be run on a cpu with no setup or long training. The encoders and classifier take prohibitively long to train on CPU, so a GPU is recommended.
Their are three models, each requiring different data and setup.
Run reservoir_transfer.py
replacing the argument with your content and style file, no pretraining required.
Requires many samples of speech from content and style classes to train on. Training on a GPU is recommended.
The model will do a train,val, test split, train on the training data, checkpoint regularly, and output a few examples of inference on test data.
Run autoencoder_transfer.py
replacing the argument with a directory to your audio files supplied. Currently supports two directories of .wav
files (male
and female
for the provided example).
This solution has not resulted in satisfying results, use at your own risk.
Train a classifier (example supplied in train_classifier.py
) and then use it in pretrained_transfer.py
.
DAPS (Device and Produced Speech) Dataset
Alice in Wonderland Audioobook
A Neural Algorithm of Artistic Style
Texture Synthesis Using Shallow Convolutional Networks with Random Filters
Discrete Variational Autoencoders
Inspiration for speech classifier