Skip to content

Priberam/Enhance-CB-Whisper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Adding User Feedback To Enhance CB-Whisper

This repository contains code that allows to reproduce all experiments performed in the paper "Adding User Feedback To Enhance CB-Whisper".

Setup

Create a conda environment and activate it

  conda create -n biasing-whisper python=3.10
  conda activate biasing-whisper

Install ffmpeg and the necessary requirements

  conda install 'ffmpeg<5'
  pip install -r requirements.txt

Build Datasets

The following bash scripts will create compatible folder structures for the scripts present in this repository, as well as do all the pre-processing necessary to train and evaluate the KWS classifier.

Aishell-KWS and Aishell hotwords subsets

To build the Aishell-KWS dataset, download the data_aishel.tgz file from here and place it on the directory where the dataset will be built.

In the project directory, do as follows

cd datasets/aishell/

Activate the conda environment

conda activate kws

And run the bash script

bash build.sh

You will be asked to provide the path to the tgz file. It will take several hours, so make sure you open some tmux session and let it run uninterruptedly.

MLS-KWS

To build the MLS-KWS dataset, download the zip files from here for the English, German, French, Spanish, Portuguese and Polish languages. Everything else is equivalent to what was done with the Aishell-KWS dataset.

ACL6060

To build the ACL6060 dataset, download the zip file from here. Everything else is equivalent to what was done with the Aishell-KWS dataset.

The KWS Classifier

The CNN classifier for KWS was inspired in the one originally proposed in CB-Whisper. This repository contains additional features that allow to reproduce the experiments on KWS performed in the afforementioned paper:

  • Training using features derived from either TTS-generated or natural-speech audios for the keywords, or a mixture of both;
  • DANN and DANNCE implementation;
  • Validation of checkpoints using more than one dataset (for domain generalization analysis).

Training

In the project directory, do as follows

cd src/

And run the following command

python3 run_CLI.py fit --config configs/train.yaml

In the train.yaml file, you will be able to set different hyperparameters, the paths to the dataset folders, which datasets to validate the model every epoch, the logger, and other training details. Important settings that must be introduced are capitalized and between square brackets.

Evaluation

The following can be used to evaluate the precision, recall and F1 scores of the KWS classifier on the test sets of the different datasets.

In the project directory, do as follows

cd src/

And run the following command

python3 kws.py test --config configs/kws-***.yaml

For ease of use, there is one config yaml file per dataset. Do not forget to set the paths to the dataset folders and the given checkpoint to evaluate. Important settings that must be introduced are capitalized and between square brackets.

Evaluate CB-Whisper with PBAWhisper

The following can be used to evaluate the entity recall of the CB-Whisper model on the test sets of the different datasets, using the KWS classifiers developed with these scripts. These results were not reported in the paper "Adding User Feedback To Enhance CB-Whisper". This version of CB-Whisper uses a wrapped version of Huggingface's WhisperForConditionalGeneration, also known as PBAWhisper, that can perform longform transcription jointly with keyword spotting on the go.

In the project directory, do as follows

cd src/

And run the following command

python3 cb-whisper.py test --config configs/cb-whisper-***.yaml

For ease of use, there is one config yaml file per dataset. Do not forget to set the paths to the dataset folders and the given checkpoint to evaluate. Important settings that must be introduced are capitalized and between square brackets.

License

See the LICENSE.md file for details.

Citation

If you use any of the resources in this repository, please cite the following paper:

Citation will be added in the future.