It is Kaggle competition challenge to automate the process of developing text normalization grammars via machine learning focusing on English language. It is funded by Google’s Text Normalization Research Group, which conducts research and develops tools for the identification, normalization and denormalization of non-standard terms, such as abbreviations, numbers or currency expressions, measuring phrases, addresses or dates, representing unique entities that are semantically limited. However, one of the biggest challenges when developing a TTS or ASR system for a new language is to develop and test the grammar for all these rules. This project presents a challenge to the community given a large corpus of written text aligned to its normalized spoken form. We have applied three different algorithms with different approaches to predict normalized text, such as XGBoost, Sequence-to-Sequence, Long Term Short Memory (LSTM). Our approach takes very long time to train and to evaluate the model. We present different model experiments and results with various parameter settings. We have achieved excellent accuracy score in XBboost which got 99.52%.
-
Notifications
You must be signed in to change notification settings - Fork 2
TahaniFennir/Text-Normalization-Challenge---English-Language
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published