Python app that allows to snip a region of a screen and automatically recognize Chinese text and translate it.
A user can grab a screen region like in any other snipping apps.
But instead of showing a picture, it recognizes Chinese and English text, and outputs it to textbox with Pinyin and English translation aside.
The quality of OCR is relatively OK. It's engined by CnOCR and densenet_lite_136-gru, db_resnet34 models.
Offline translation is engined by Argos Translate and the quality is worse comparing to Google Translate for Chinese. But it's enough for a user to take a sense of it anyway.
I tested it and now use it on one machine which is:
- Windows 10 operating system
- PyTorch 2.1.1
- Python 3.11 64-bit (by the time of development PyTorch didn't support Python 3.12)
- Cuda 12.1 - not relevant if you switch to OCR CPU models or if you have AMD GPU (see CnOCR documentation)
- NVIDIA GeForce RTX 2060 GPU - not relevant if you use CPU OCR.
It's small, OS-agnostic application. I expect it to run in all operating systems. But if you have multiple monitors configuration, maybe I miscalculate region offsets in Linux and macOS. I saw some other possibilities in Tkinter
instantiation, but it's relative to X11 only.
- Read PyTorch manual because versions below might differ by the time of you reading this
- Install Python 3.11
- Install Cuda if you have NVIDIA
- Install PyTorch
- Clone the repo
- Read CnOCR documentation and fix models in the recognition.py file if you don't like mine
- Sync Python project requirements
- Run
main.py
First start could take 30 min because CnOCR and Argos download its models.
- I took snipping-tool as a base for snapping UX. UI is done with QT/Tkinter .
- MSS for screen region grabbing. Because PIL grabber doesn't work for secondary screens.
- CnOCR for Chinese/English OCR - the core feature of the app.
- pinyin to translate Chinese chars to Pinyin.
- Argos Translate for Chinese to English offline translation.
It can recognize Chinese. And sometimes it doesn't. And English translation is not optimal, say the least. But it saves me a lot of time anyway.