Text_to_Speech

A combination of Deep Learning and Google Translate to convert handwritten text to audio output.

This project takes documented and handwritten text as input and provides translated output in audio format available in 108 different languages. The backbone of the project is the handwritten text detection model which is trained using transfer learning on RESNET50.

The input for the model was combined from a dataset available on Kaggle with the MNIST dataset and all the images were resized to (32,32) . The total image count on which model was trained was 4,42,451 .

The model was trained for 50 epochs on SGD optimizer and training and validation accuracy of 96.53% and 96.81% respectively were recorded.

The classification report for every character:

The model was trained on Tensorflow 2.1.0 and OpenCV 4.2.0.

The trained model file is available on https://github.com/sanskar-hasija/Text_to_Speech/blob/main/Trained%20Model/model.h5

Also , with the help of Pytesseract library, documented text is converted and later translated . One example of documented detection is as follows:

The translated output for the above image is :

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Result Images		Result Images
Trained Model		Trained Model
data		data
resnet		resnet
saved_sounds		saved_sounds
test images		test images
.gitattributes		.gitattributes
Classification Report.PNG		Classification Report.PNG
Model Training and Saving.ipynb		Model Training and Saving.ipynb
README.md		README.md
Text-to-speech.ipynb		Text-to-speech.ipynb
training plot.png		training plot.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text_to_Speech

About

Releases

Packages

Languages

BhargavMaganti/Text-to-speech

Folders and files

Latest commit

History

Repository files navigation

Text_to_Speech

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages