Scene Text Image Super-resolution based on Text-conditional Diffusion Models

This is the official repository of the WACV2024 paper "Scene Text Image Super-resolution based on Text-conditional Diffusion Models". This repository is based on openai/improved-diffusion.

Pre-trained models for DiMSS and GT-DiMSS

We are going to release checkpoints of the models and two generated dataset, SynTZ and SynSTR, for the main results in the paper.

Model Tranining

Requirements

To get started, install the required python packages using the following command:

pip install -e .

Dataset

Downoad the TextZoom dataset from

https://github.com/JasonBoy1/TextZoom

The structure of dataset directory is

dataset
`-- TextZoom
    |-- test
    |   |-- easy
    |   |   |-- data.mdb
    |   |   `-- lock.mdb
    |   |-- hard
    |   |   |-- data.mdb
    |   |   `-- lock.mdb
    |   `-- medium
    |       |-- data.mdb
    |       `-- lock.mdb
    |-- train1
    |   |-- data.mdb
    |   `-- lock.mdb
    `-- train2
        |-- data.mdb
        `-- lock.mdb

Pretrained recognizers

Download pretrained recognizers (CRNN, ASTER, MORAN).

CRNN:

https://github.com/meijieru/crnn.pytorch

ASTER:

https://github.com/ayumiymk/aster.pytorch

MORAN:

https://github.com/Canjie-Luo/MORAN_v2

Training

To train DiMSS on the TextZoom dataset, run the script via

bash train_dimss_textzoom.sh

Also, use the following script to train GT-DiMSS

bash train_gt_dimss_textzoom.sh

Inference

To generate SR images from the LR images of TextZomm with a trained DiMSS, run the script via

bash eval_dimss_textzoom.sh

Also, use the following script to generate SR images of TextZoon with a trained GT-DiMSS

bash eval_gt_dimss_textzoom.sh

LR-HR Paired Text Image Synthesis

Training Synthesizer

1. Dataset

To train Synthesizer, the preprocessed STR dataset is required in addition to the TextZoom dataset. Download the preprocessed STR dataset from

https://github.com/ku21fan/STR-Fewer-Labels

The structure of dataset directory is

dataset
`-- data_CVPR2021
    `-- training
        `-- label
            `-- real
                |-- 1.SVT
                |   |-- data.mdb
                |   `-- lock.mdb
                |-- 10.MLT19
                |   |-- data.mdb
                |   `-- lock.mdb
                |-- 11.ReCTS
                |   |-- data.mdb
                |   `-- lock.mdb
                |-- 2.IIIT
                |   |-- data.mdb
                |   `-- lock.mdb
                |-- 3.IC13
                |   |-- data.mdb
                |   `-- lock.mdb
                |-- 4.IC15
                |   |-- data.mdb
                |   `-- lock.mdb
                |-- 5.COCO
                |   |-- data.mdb
                |   `-- lock.mdb
                |-- 6.RCTW17
                |   |-- data.mdb
                |   `-- lock.mdb
                |-- 7.Uber
                |   |-- data.mdb
                |   `-- lock.mdb
                |-- 8.ArT
                |   |-- data.mdb
                |   `-- lock.mdb
                `-- 9.LSVT
                    |-- data.mdb
                    `-- lock.mdb

To perform the preprocessing for the Synthesizer training, run the script via

python preprocessing_STR.py

When the preprocessing is complete, preprossed text images and the corresponding text labels are placed in dataset/STR/img and dataset/STR/word, respectively.

2. Training

To train Synthesizer, run the script via

bash train_synthesizer.sh

Training Super-resolver

Super-resolver is identical to GT-DiMSS trained on TextZoom. See the DiMSS section described eariler.

Training Degrader

Degrader is trained on TextZoom only. To train Degrader, run the script via:

bash train_degrader.sh

Synthesizing Text Images

1. Synthesizer

To run Synthesizer, run the script via:

bash run_synthesizer.sh

The generated text images and the corresponding text labels are placed in ./diff_samples/mr_samples.

2. Postprocessing

To perform the preprocessing for the generated text images, run the script via:

python postprocessing_text_images.py

The postprocessed text images are placed in ./diff_samples/mr_samples/postprocessed.

Generating LR and HR text images

1. Super-resolver

To run Super-resolver, run the script via:

bash run_super_resolver.sh

The generated HR text images are placed in ./diff_samples/hr_samples.

2. Degrader

To run Degrader, run the script via:

bash run_degrader.sh

The generated LR text images are placed in ./diff_samples/lr_samples.

Citation

@article{noguchi2023scene,
  title={Scene Text Image Super-resolution based on Text-conditional Diffusion Models},
  author={Noguchi, Chihiro and Fukuda, Shun and Yamanaka, Masao},
  journal={arXiv preprint arXiv:2311.09759},
  year={2023}
}

Licence

The code will be released with the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
ckpt		ckpt
dataset		dataset
diff_samples		diff_samples
guided_diffusion		guided_diffusion
lexicon		lexicon
text_recognition		text_recognition
LICENSE		LICENSE
README.md		README.md
README_.md		README_.md
eval_dimss_textzoom.sh		eval_dimss_textzoom.sh
eval_gt_dimss_textzoom.sh		eval_gt_dimss_textzoom.sh
evaluate.py		evaluate.py
postprocessing_text_images.py		postprocessing_text_images.py
preprocessing_STR.py		preprocessing_STR.py
run_degrader.sh		run_degrader.sh
run_super_resolver.sh		run_super_resolver.sh
run_synthesizer.sh		run_synthesizer.sh
sample_on_textzoom.py		sample_on_textzoom.py
setup.py		setup.py
super_resolver_degrader_sample.py		super_resolver_degrader_sample.py
synthesizer_sample.py		synthesizer_sample.py
train_degrader.sh		train_degrader.sh
train_dimss_textzoom.sh		train_dimss_textzoom.sh
train_gt_dimss_textzoom.sh		train_gt_dimss_textzoom.sh
train_on_str_textzoom.py		train_on_str_textzoom.py
train_on_textzoom.py		train_on_textzoom.py
train_synthesizer.sh		train_synthesizer.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scene Text Image Super-resolution based on Text-conditional Diffusion Models

Pre-trained models for DiMSS and GT-DiMSS

Model Tranining

Requirements

Dataset

Pretrained recognizers

Training

Inference

LR-HR Paired Text Image Synthesis

Training Synthesizer

1. Dataset

2. Training

Training Super-resolver

Training Degrader

Synthesizing Text Images

1. Synthesizer

2. Postprocessing

Generating LR and HR text images

1. Super-resolver

2. Degrader

Citation

Licence

About

Releases

Packages

Languages

License

ToyotaInfoTech/stisr-tcdm

Folders and files

Latest commit

History

Repository files navigation

Scene Text Image Super-resolution based on Text-conditional Diffusion Models

Pre-trained models for DiMSS and GT-DiMSS

Model Tranining

Requirements

Dataset

Pretrained recognizers

Training

Inference

LR-HR Paired Text Image Synthesis

Training Synthesizer

1. Dataset

2. Training

Training Super-resolver

Training Degrader

Synthesizing Text Images

1. Synthesizer

2. Postprocessing

Generating LR and HR text images

1. Super-resolver

2. Degrader

Citation

Licence

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages