Add training guide and align text detection model training with recognition model #8

robertknight · 2024-01-28T18:07:51Z

Add a guide with steps to train the text detection and recognition models from scratch. Along the way various improvements to the text detection training tool and other things were needed.

Remove torch, torchvision from the Pipfile and install them separately. This is needed because the dependencies vary by platform and GPU
Fix warning about antialias setting in Resize transform by setting it explicitly
Move option for exporting text detection model to training script
Fix a sporadic failure during text detection training due to target masks having values slightly outside the [0, 1] range after applying transforms
Add Weights and Biases integration to the text detection training script
Speed up testing/validation phase of text detection training by optimizing box intersection tests when computing metrics

Fixes #6

This is needed for ONNX export to work.

This is the current default behavior, but PyTorch warns that this value is changing in future. Setting `antialias=True` might produce better results, but currently leads to out-of-range errors during loss computation which needs to be resolved first.

Align the text detection model training script with the recognition model training by: - Adding a `--export` option to export a checkpoint to ONNX after loading it - Adding a `--max-epochs` flag to trigger automatic termination of the training process after a fixed number of epochs - Adding wandb integration to allow tracking training progress

Model export is now implement in `train_detection` instead.

When using `antialias=True` with the `Resize` transform on target masks, the resulting values could sometimes be slightly above 1.0. The same thing happened when training with CUDA even without this.

- Use `pin_memory` for data loaders. This was already used in the recognition training script. - Move prediction / target masks to CPU once per batch, instead of separately per item

These need to be installed separately as the dependencies will vary by platform and GPU.

Replace precise intersection test with a cheap bounding box intersection test.

robertknight added 8 commits January 28, 2024 08:18

Add onnx dependency

f739982

This is needed for ONNX export to work.

Remove --export option from eval_detection script

e260789

Model export is now implement in `train_detection` instead.

Clamp text detection mask targets to [0, 1]

fdae209

When using `antialias=True` with the `Resize` transform on target masks, the resulting values could sometimes be slightly above 1.0. The same thing happened when training with CUDA even without this.

Make a couple of small optimizations to detection GPU training

6d3bdd8

- Use `pin_memory` for data loaders. This was already used in the recognition training script. - Move prediction / target masks to CPU once per batch, instead of separately per item

Remove torch and torchvision from Pipfile

20480e2

These need to be installed separately as the dependencies will vary by platform and GPU.

Fix typo in train_rec.py

cf9a0d2

robertknight force-pushed the training-guide branch 2 times, most recently from 5c67e64 to e478d39 Compare January 30, 2024 07:47

Add initial training guide for training detection and recognition models

c11284c

robertknight force-pushed the training-guide branch from e478d39 to 293c774 Compare January 30, 2024 07:53

Optimize box_match_metrics with bounding box intersection test

4688482

Replace precise intersection test with a cheap bounding box intersection test.

robertknight force-pushed the training-guide branch from 293c774 to 4688482 Compare January 30, 2024 07:57

robertknight marked this pull request as ready for review January 30, 2024 08:03

Add link to pre-trained models on Hugging Face

64361c2

robertknight merged commit 9c6c72a into main Jan 30, 2024
1 check passed

robertknight deleted the training-guide branch January 30, 2024 17:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add training guide and align text detection model training with recognition model #8

Add training guide and align text detection model training with recognition model #8

robertknight commented Jan 28, 2024 •

edited

Loading

Add training guide and align text detection model training with recognition model #8

Add training guide and align text detection model training with recognition model #8

Conversation

robertknight commented Jan 28, 2024 • edited Loading

robertknight commented Jan 28, 2024 •

edited

Loading