Skip to content

Fine-tune wav2vec2-xls-r on data from low-resource-languages

Notifications You must be signed in to change notification settings

elerdg/ASR-for-low-resource-languages

Repository files navigation

ASR for Low-resource languages

Overview

This project aims to perform automatic speech recognition for low-resource languages. To do so we fine-tuned wav2vec2-xls-r (Babu et al., 2021) on labeled speech data from different languages from the Mozilla Common Voice dataset (Ardila et al., 2020).

The scripts in this repo allow to:

  • fine-tune wav2vec2-xls-r on labeled speech data from one language
  • fine-tune wav2vec2-xls-r training jointly two different languages
  • test the fine-tuned model
  • test the fine-tuned model incorporating an n-gram language model

For our purpose we use characters as speech units. The tokenizers contain the vocabularies for the langauges:

  • Italian (it)
  • Arabic (ar)
  • Galician (gl)
  • Romansh Vallader (rm-vallader)

To fine-tune wav2vec2-xls-r we use the tokenizer of the language as decoder layer to train the model.

Fine-tuning wav2vec2-xls-r

  • To fine-tune the pre-trained model wav2vec2-xls-r on the target language see the Notebook "Notebook_fine_tuning_wav2vec2_xls_r"
  • To create a bilingual model by fine-tuning the pre-trained model refer to the the Notebook "Notebook_bilingual_fine_tuning"
  • To compute inferences on the fine-tuned model use the Notebook "Notebook_inference.ipynb"
  • in the second part of the notebook is explained how to compute the inferences using a Language Model

About

Fine-tune wav2vec2-xls-r on data from low-resource-languages

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published