In this repository we extract text from image using tessaract-ocr and pytesseract and translate text into other language using textblob. and make a web app using flask.
- convert image to grayscale.
- Detect edges of images.
- Find contour of image(region were text are present)
- Extract text region from image
- From extracted region we get image data using pytesseract.
To transalate text we first we use text blob module it's auto detect input language of text and translate it into desired language in this repo we translate text into three language. English,Hindi,Punjabi
Here we use ubuntu 20.04 focal fossa To run this repo on your system you need to download some required module
sudo apt install tesseract-ocr
you need to download treeseract ocr from
https://github.com/UB-Mannheim/tesseract/wiki
And add this line in methods.py
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
- pytesseract
- opencv
- textblob
- numpy
- flask
- wekzeug
- urllib
- os
- time