Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md

Text Recognition

Example to train a text recognition model as described in Shi et al. (2015), CRNN: An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition.

Usage

Navigate (cd) to the root of the toolbox [YOUR_SIGHTSEQ_ROOT].
Prepare Dataset

It is recommended to symlink the dataset root to [YOUR_SIGHTSEQ_ROOT]/data-bin.

.
├── data-bin
│   └── [DATA]
│       ├── images
│       ├── train.txt
│       ├── valid.txt
│       ├── test.txt
│       └── dict.txt (will be generated in the [preprocess strategy](#preprocess-strategy))
├── examples
└── sightseq

Annotation file format

In each line in the annotation file, the format is:

img_path char1 char2 char3 char4 char5 ...

where the char is the sequence's character.

For example, there is an image named "00120_00091.jpg" in folder [DATA]/images, its constant is "hello world", there should be a line in the [DATA]/train.txt or [DATA]/valid.txt.

00120_00091.jpg h e l l o w o r l d

Preprocess Strategy

Generate dict.txt:

python -m sightseq.preprocess --task text_recognition \
    --trainpref [DATA]/train.txt \
    --destdir [DATA] --padding-factor 1

Training

Training strategy (Attention):

python -m sightseq.train [DATA] \
    --task text_recognition --arch decoder_attention \
    --decoder-layers 2 --batch-size 16 --dropout 0.0 \
    --max-epoch 100 --criterion cross_entropy --num-workers 4 \
    --optimizer adam --adam-eps 1e-04 --lr 0.001 --min-lr 1e-09 \
    --adam-betas '(0.9, 0.98)' --clip-norm 0.0 --weight-decay 0.0 \
    --no-token-crf --save-interval 1

Training strategy (Transformer):

python -m sightseq.train [DATA] \
    --task text_recognition --arch decoder_transformer \
    --batch-size 16 --dropout 0.0  --max-epoch 100 \
    --criterion cross_entropy \
    --num-workers 4 --optimizer adam --decoder-layers 2 \
    --adam-eps 1e-04 --lr 0.001 --min-lr 1e-09 \
    --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
    --weight-decay 0.0 --no-token-crf --no-token-rnn \
    --save-interval 1 --encoder-normalize-before

Training strategy (CRNN):

python -m sightseq.train [DATA] \
    --task text_recognition --arch decoder_crnn \
    --decoder-layers 2 --batch-size 16 \
    --max-epoch 50 --criterion ctc_loss --num-workers 4 \
    --optimizer adam --adam-eps 1e-04 --lr 0.001 --min-lr 1e-09 \
    --adam-betas '(0.9, 0.98)' --clip-norm 0.0 --weight-decay 0.0 \
    --save-interval 1

Testing

Use trained model to test (Attention):

python -m sightseq.generate_ocr [DATA] \
    --arch decoder_attention --path [CHECKPOINTS_DIR] \
    --task text_recognition \
    --buffer-size 16 --num-workers 4 --gen-subset valid \
    --beam 5 --batch-size 16 --quiet

Use trained model to test (Transformer):

python -m sightseq.generate_ocr [DATA] \
    --arch decoder_transformer --path [CHECKPOINTS_DIR] \
    --task text_recognition \
    --buffer-size 16 --num-workers 4 --gen-subset valid \
    --batch-size 16 --beam 5 --quiet

Use trained model to test (CRNN):

python -m sightseq.generate_ocr [DATA] \
    --arch decoder_crnn --path [CHECKPOINTS_DIR] \
    --task text_recognition --criterion ctc_loss \
    --sacrebleu \
    --buffer-size 16 --num-workers 4 --gen-subset valid \
    --batch-size 16 --quiet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text_recognition

text_recognition

README.md

Text Recognition

Usage

Annotation file format

Preprocess Strategy

Training

Testing

Files

text_recognition

Directory actions

More options

Directory actions

More options

Latest commit

History

text_recognition

Folders and files

parent directory

README.md

Text Recognition

Usage

Annotation file format

Preprocess Strategy

Training

Testing