Training Deep Learning models using Google Speech Commands Dataset, implemented in PyTorch.
- Training and testing basic ConvNets and TDNNs.
- Standard Train, Test, Valid folders for the Google Speech Commands Dataset v0.02.
- Dataset loader for standard Kaldi speech data folders (files and pipes).
To install SoX on Mac with Homebrew:
brew install sox
on Linux:
sudo apt-get install sox
To download and extract the Google Speech Commands Dataset run the following command:
./download_audio.sh
Use python3 run.py --help
for more parameters and options.
python3 run.py --arc VGG16 --checkpoint VGG16 --num_workers 10
Accuracy results for the validation and test sets using the default parameters (VGG16) and with data augmentation (VGG16 + sp)
Model | Valid acc. | Test acc. | parameters and options |
---|---|---|---|
VGG16 | 96.3% | 96.4% | default |
VGG16 + sp | 96.6% | 96.7% | --train_path data/train_training_sp |
The augmented training dataset train_training_sp is an speed perturbed version of the train_training dataset. It was obtained using the Kaldi script perturb_data_dir_speed_3way.sh