Skip to content

Latest commit

 

History

History
48 lines (21 loc) · 1.38 KB

README.md

File metadata and controls

48 lines (21 loc) · 1.38 KB

Likitham

This repo contains scripts and datasets for processing Telugu language data.

Scripts

Checkout module docstrings of individual scripts on how to use them.

Models

te.pyrnn.gz - Telugu language model(LSTM + CTC) trained with ocropy

Dataset

Sample training data. You can use scripts to generate customized training data.

Useful links

Telugu fonts

Telugu POS tagger

Isolated Handwritten Telugu Character Dataset

Telugu and other south asian language data

Corpus search engine

tessaract-te - Tesseract Open Source OCR Engine

banti_telugu_ocr - End to end OCR system for Telugu. Based on Convolutional Neural Networks.

Chamanti_ocr - Telugu OCR framework using RNN, CTC in Theano & Python3.

http://docs.cltk.org/en/latest/telugu.html

http://www.tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=264&lang=en

http://www.tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=1892&lang=en

http://ildc.in/Telugu/htm/lin_ocr_spell.htm