Text-Normalization-Challenge---English-Language

It is Kaggle competition challenge to automate the process of developing text normalization grammars via machine learning focusing on English language. It is funded by Google’s Text Normalization Research Group, which conducts research and develops tools for the identification, normalization and denormalization of non-standard terms, such as abbreviations, numbers or currency expressions, measuring phrases, addresses or dates, representing unique entities that are semantically limited. However, one of the biggest challenges when developing a TTS or ASR system for a new language is to develop and test the grammar for all these rules. This project presents a challenge to the community given a large corpus of written text aligned to its normalized spoken form. We have applied three different algorithms with different approaches to predict normalized text, such as XGBoost, Sequence-to-Sequence, Long Term Short Memory (LSTM). Our approach takes very long time to train and to evaluate the model. We present different model experiments and results with various parameter settings. We have achieved excellent accuracy score in XBboost which got 99.52%.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
Neural_Network		Neural_Network
References		References
XGBoosting		XGBoosting
data_analysis		data_analysis
input		input
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-Normalization-Challenge---English-Language

About

Releases

Packages

Contributors 3

Languages

TahaniFennir/Text-Normalization-Challenge---English-Language

Folders and files

Latest commit

History

Repository files navigation

Text-Normalization-Challenge---English-Language

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages