Skip to content

hundredblocks/insight-accent

Repository files navigation

insight-accent

Repeating something you say in a different voice

Summary

My project for Insight AI. A three week project to explore applications of Style Transfer to speech data.

Dependencies

Libraries: tensorflow, numpy, matplotlib

Reservoir sampling models can be run on a cpu with no setup or long training. The encoders and classifier take prohibitively long to train on CPU, so a GPU is recommended.

Usage

Their are three models, each requiring different data and setup.

Reservoir Computing

Run reservoir_transfer.py replacing the argument with your content and style file, no pretraining required.

Variational Autoencoder

Requires many samples of speech from content and style classes to train on. Training on a GPU is recommended.

The model will do a train,val, test split, train on the training data, checkpoint regularly, and output a few examples of inference on test data.

Run autoencoder_transfer.py replacing the argument with a directory to your audio files supplied. Currently supports two directories of .wav files (male and female for the provided example).

Pretrained classifier

This solution has not resulted in satisfying results, use at your own risk.

Train a classifier (example supplied in train_classifier.py) and then use it in pretrained_transfer.py.

Data source

DAPS (Device and Produced Speech) Dataset

Alice in Wonderland Audioobook

References/Inspiration

Papers

A Neural Algorithm of Artistic Style

Texture Synthesis Using Shallow Convolutional Networks with Random Filters

Discrete Variational Autoencoders

Online resources

FF Labs Autoencoders

VAE for video style transfer

Paperspace Autopencoders

Inspiration for speech classifier

Using Reservoir Computing for Audio Style

Using Reservoir Computing for Audio Style

About

Repeating something you say in a different voice

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages