Skip to content

Jenarthanan14/Unified-Voice-Embedding-Using-Multi-task-Learning

Repository files navigation

Hive-MTL: Unified Voice Embedding through Multi-task Learning.

Contributors Stargazers Forks Issues | Pretrained Models.

Speech technologies is one of the evolving and highly demanded area for the past few decades due to the huge progress brought by machine learning technology. Especially the past decade has brought tremendous progress which includes the introduction of conversational agents. In this work we describe a multi-task deep metric learning system to learn a single unified audio embedding which can be used to power our multiple audio/speaker specific tasks. The solution we present not only allows us to train for multiple application objectives in a single deep neural network architecture, but takes advantage of correlated information in the combination of all training data from each application to generate a unified embedding that outperforms all specialized embeddings previously deployed for audio/speaker specific task.

Architecture Diagram

Architecture Diagram

Getting started

Install dependencies

Requirements

  • tensorflow>=2.0
  • keras>=2.3.1
  • python>=3.6
pip install -r requirements.txt

If you see this error: libsndfile not found, run this: sudo apt-get install libsndfile-dev.

Training

The code for training is available in this repository.

sudo chmod -R 777 hive-mtl/ # Give write permision to hive-mtl
pip uninstall -y tensorflow && pip install tensorflow-gpu
./hive-mtl download_librispeech # Download Librispeech dataset
./hive-mtl build_mfcc
./hive-mtl build_model_inputs
./hive-mtl train_mtl

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Authors

  • Rajenthiran Jenarthanan
  • Lakshikka Sithamparanathan
  • Saranya Uthayakumar

See also the list of contributors who participated in this project.

References

  • Deep Speaker : An End-to-End Neural Speaker Embedding System by Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li.

Acknowledgments

  • Ketharan Suntharam
  • Sathiyakugan Balakirshnan

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published