Skip to content

Latest commit

 

History

History
104 lines (101 loc) · 4.74 KB

TODO.org

File metadata and controls

104 lines (101 loc) · 4.74 KB

documentation

  • [ ] have more detailed command description on ‘abkhazia <command> –help’. Assume the user doesn’t know abkhazia or kaldi.
  • [-] improve the online documentation
    • [X] install and configure
    • [ ] preparators for new corpora
    • [ ] ASR: LM, AM, features, alignment, etc
    • [ ] ABX: item files generation from alignment
  • [ ] add READMEs in intermediate directories of the project to explain the sources tree (or have a ‘project organization’ page)

corpus [1/4]

  • [X] split with (p_train + p_test) < 1 (by now only =1 supported)
  • [ ] des méthodes de stats sur les corpus (duration, per spks, etc)
  • [ ] need of a structure to append data to an existing corpus (eg attach a LM, features or manual alignments)
  • [ ] From the Thomas’s notes we want… Optionally, the corpus directory may contains:
    • Time-alignments (The previous files in the list are sufficient for training ASR models with kaldi, however for ABX experiments we also need time-alignments for each phone or each word. If they are not provided directly, a surrogate will be obtained through forced-alignment using the kaldi-trained ASR models)
    • A language model, either in OpenFST text or binary format or in ARPA-MIT N-gram format
    • Syllabification of each utterance

preparators [2/5]

  • [X] GlobalPhone & CSJ : Laisser la possibilité de choisir de garder ou non le ‘+’ (pour l’instant on l’enlève)
  • [X] Chercher les transcriptions manquantes pour GlobalPhone (pour l’instant hard codé dans préparateur)
  • [ ] Faire un dossier GlobalPhone et un dossier CSJ pour les différents cas possible pour chaque corpus
  • [ ] Choisir CLAIREMENT la façon de traiter les parenthèses dans CSJ (pour l’instant :
    • (W ..1;..2) : on garde ..1
    • (?..) : on remplace le SUW par “?”
    • (R xxx) : on remplace par “?”
    • reste : on enlève caractère + parenthèse et on garde tout
  • [ ] On enlève les IPU lorsqu’il y a un caractère problèmatique (e.g. “(?” ou “(W” ou autre) dans un des SUW ?

acoustic [0/2]

  • [ ] –retrain option it should be possible to retrain a trained model on a new corpus (for instance, specifically retrain silence models, or retrain on a bunch of new corpus)
  • [ ] questions vs data-driven option (is it for tree building ?)

prepare_lang branch [1/2]

AM does not depend on LM [3/3]

  • [X] write a prepare_lang.sh standalone wrapper
  • [X] independant call to prepare_lang in AM classes
  • [X] update the command line API in consequence

fix decode and align [3/7]

  • [X] pass all the tests
  • [X] refactor core -> am now exports the lang folder
  • [X] optionally disable the scoring in decode forward –skip-scoring from kaldi to abkhazia
  • [ ] check for phonesets compatibility between AM/LM
  • [ ] add tests for cross use of AMs/LMs
  • [ ] add tests for wpd decoding
  • [ ] update the command line API in consequence

notes

required files by mkgraph.sh are

  • AM $lang/L.fst
  • LM $lang/G.fst
  • AM/LM $lang/phones.txt
  • AM? $lang/words.txt
  • AM/LM? $lang/phones/silence.csl
  • AM/LM? $lang/phones/disambig.int
  • AM $model/final.mdl
  • AM $model/tree

Minor changes [0/8]

upgrade the bootphon/kaldi repo

add cuda support in the install_kaldi.sh script

in export use symlinks to save some place

(file in output-dir, link in recipe-dir)

updating abkhazia.cfg

  • Need of an automated way to update new versions of the installed configuration file in the ./configure script.

adjust log level by detecting WARNING and ERROR from Kaldi messages

Those messages are actually logged in debug level, should be smater to be warnong/error 2016-10-12 16:33:58,372 - DEBUG - ERROR (apply-cmvn:Write():kaldi-matrix.cc:1229) Failed to write matrix to stream 2016-10-12 16:33:58,373 - DEBUG - WARNING (apply-cmvn:Write():util/kaldi-holder-inl.h:54) Exception caught writing Table object: ERROR (apply-cmvn:Write():kaldi-matrix.cc:1229) Failed to write matrix to stream

Have completion setup during installation (or configuration?)

Have a test_commands module for testing command line interface

New specifications (0.4)

corpus = BuckeyeCorpusPreparator('./buckeye').prepare()
corpus.speakers()
utt = corpus.utterances()

train, _ = corpus.split(train_prop=0.5, by_speakers=True)
train.save2h5('train.h5', group='corpus', wavs=True)
corpus = Corpus.read('train.h5', group='corpus')

lm = LanguageModelProcessor(order=3, level='word').compute(corpus)
lm.save('lm.fst')
lm.save2h5('train.h5', group='word-trigram')
assert lm.order == 3
assert lm.level == 'word'

features = FeaturesProcessor('mfcc', delta=2, pitch=True).compute(corpus)
f = features[utt[0]]  # np.array
features.write2h5('train.h5', 'features')
features.write2ark('/somewhere')