Skip to content

Speech Commands Recognition using end-to-end deep learning models in pytorch

License

Notifications You must be signed in to change notification settings

jarfo/gcommands

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speech Commands Recognition

Training Deep Learning models using Google Speech Commands Dataset, implemented in PyTorch.

Features

  • Training and testing basic ConvNets and TDNNs.
  • Standard Train, Test, Valid folders for the Google Speech Commands Dataset v0.02.
  • Dataset loader for standard Kaldi speech data folders (files and pipes).

Requirements

To install SoX on Mac with Homebrew:

brew install sox

on Linux:

sudo apt-get install sox

Usage

Google Speech Commands Dataset (v0.02)

To download and extract the Google Speech Commands Dataset run the following command:

./download_audio.sh

Training

Use python3 run.py --help for more parameters and options.

python3 run.py --arc VGG16 --checkpoint VGG16 --num_workers 10

Results (Isolated word recognition, Speech Commands v0.02, 36 words)

Accuracy results for the validation and test sets using the default parameters (VGG16) and with data augmentation (VGG16 + sp)

Model Valid acc. Test acc. parameters and options
VGG16 96.3% 96.4% default
VGG16 + sp 96.6% 96.7% --train_path data/train_training_sp

The augmented training dataset train_training_sp is an speed perturbed version of the train_training dataset. It was obtained using the Kaldi script perturb_data_dir_speed_3way.sh

About

Speech Commands Recognition using end-to-end deep learning models in pytorch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published