Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add training guide and align text detection model training with recognition model #8

Merged
merged 11 commits into from
Jan 30, 2024

Conversation

robertknight
Copy link
Owner

@robertknight robertknight commented Jan 28, 2024

Add a guide with steps to train the text detection and recognition models from scratch. Along the way various improvements to the text detection training tool and other things were needed.

  • Remove torch, torchvision from the Pipfile and install them separately. This is needed because the dependencies vary by platform and GPU
  • Fix warning about antialias setting in Resize transform by setting it explicitly
  • Move option for exporting text detection model to training script
  • Fix a sporadic failure during text detection training due to target masks having values slightly outside the [0, 1] range after applying transforms
  • Add Weights and Biases integration to the text detection training script
  • Speed up testing/validation phase of text detection training by optimizing box intersection tests when computing metrics

Fixes #6

This is needed for ONNX export to work.
This is the current default behavior, but PyTorch warns that this value is
changing in future. Setting `antialias=True` might produce better results, but
currently leads to out-of-range errors during loss computation which needs to be
resolved first.
Align the text detection model training script with the recognition
model training by:

 - Adding a `--export` option to export a checkpoint to ONNX after loading it
 - Adding a `--max-epochs` flag to trigger automatic termination of the
   training process after a fixed number of epochs
 - Adding wandb integration to allow tracking training progress
Model export is now implement in `train_detection` instead.
When using `antialias=True` with the `Resize` transform on target masks, the
resulting values could sometimes be slightly above 1.0. The same thing happened
when training with CUDA even without this.
 - Use `pin_memory` for data loaders. This was already used in the
   recognition training script.
 - Move prediction / target masks to CPU once per batch, instead of
   separately per item
These need to be installed separately as the dependencies will vary by platform
and GPU.
@robertknight robertknight force-pushed the training-guide branch 2 times, most recently from 5c67e64 to e478d39 Compare January 30, 2024 07:47
Replace precise intersection test with a cheap bounding box intersection test.
@robertknight robertknight marked this pull request as ready for review January 30, 2024 08:03
@robertknight robertknight merged commit 9c6c72a into main Jan 30, 2024
1 check passed
@robertknight robertknight deleted the training-guide branch January 30, 2024 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add documentation for training models from scratch
1 participant