Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orient images before processing OCR #324

Merged
merged 12 commits into from
Jul 1, 2021
Merged

Conversation

Rob192
Copy link
Contributor

@Rob192 Rob192 commented Jun 22, 2021

Following our discussion on #225 I propose the following PR to orientate the document before the processing with the ocr_predictor. In my experience, this helps both the detection_predictorand the recognition_predictor.

@fg-mindee fg-mindee self-assigned this Jun 22, 2021
@fg-mindee fg-mindee added type: enhancement Improvement module: models Related to doctr.models labels Jun 22, 2021
Copy link
Contributor

@fg-mindee fg-mindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR 🙏

Two high-level comments:

  • might be better to split your main function & focus this PR on angle estimation (as a standalone feature with your method, which we'll integrate in the predictors in another PR)
  • as mentioned, the angle estimation part is not a reading feature so we'd better move this to doctr.documents.utils or doctr.models.utils 🤷‍♂️

Once this is done, we're gonna have to add typing annotations & a small unittest 👍

Let me know what you think !

doctr/documents/reader.py Outdated Show resolved Hide resolved
doctr/documents/reader.py Outdated Show resolved Hide resolved
doctr/documents/reader.py Outdated Show resolved Hide resolved
doctr/documents/reader.py Outdated Show resolved Hide resolved
@Rob192
Copy link
Contributor Author

Rob192 commented Jun 23, 2021

Hello @fg-mindee
Thank you for the throrough review. I implemented most of you comments. I am still thinking how to do two things :

  • I need to find a way to build back the original image at the end of the OCRPredictor. My first idea would be to change the DocumentBuilder to accept a page_angle parameter that would describe the global page orientation ? I would then modify visualize_page in doctr.utils.visualization to rotate back everything. Please tell me your thoughts on this matter, I am afraid I do not have the complete picture here.
  • I still need to add some testing. I will do that ASAP.

Copy link
Contributor

@fg-mindee fg-mindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the edits! First, could you merge "main" into your branch? There are some conflicts apparently. Sorry my previous review wasn't clear enough, but I believe it would be better to consider several PRs here: 1 to estimate the orientation (this one), another later on to integrate it in the predictor and making sure we can reproject the end results, and a last one to check whether we should make this a default by doing a performance check.

I might be wrong but I suspect that this method might work quite well for some specific documents with well-defined lines (french ID cards for instance), but it would lack generalization to other use cases. So let's investigate this carefully :) (and make unittests for each PR)

doctr/models/_utils.py Outdated Show resolved Hide resolved
doctr/models/_utils.py Outdated Show resolved Hide resolved
doctr/models/utils.py Outdated Show resolved Hide resolved
doctr/models/utils.py Outdated Show resolved Hide resolved
doctr/models/utils.py Outdated Show resolved Hide resolved
doctr/models/_utils.py Outdated Show resolved Hide resolved
doctr/models/_utils.py Outdated Show resolved Hide resolved
doctr/models/_utils.py Outdated Show resolved Hide resolved
doctr/models/_utils.py Show resolved Hide resolved
doctr/models/_utils.py Outdated Show resolved Hide resolved
@Rob192
Copy link
Contributor Author

Rob192 commented Jul 1, 2021

Hello @fg-mindee
Thank you for your detailled review. I split the PRs into three different one.
Could you please point me to a dataset for benchmarking the changes for the image rotation ? Or even better, to an evaluation script that you would use for the release of an update.
Thanks :)

@codecov
Copy link

codecov bot commented Jul 1, 2021

Codecov Report

Merging #324 (98c7efd) into main (5e16bf4) will increase coverage by 0.53%.
The diff coverage is 96.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #324      +/-   ##
==========================================
+ Coverage   94.22%   94.76%   +0.53%     
==========================================
  Files          66       83      +17     
  Lines        2788     3359     +571     
==========================================
+ Hits         2627     3183     +556     
- Misses        161      176      +15     
Flag Coverage Δ
unittests 94.76% <96.00%> (+0.53%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
doctr/models/_utils.py 93.93% <96.00%> (+0.60%) ⬆️
doctr/datasets/utils.py 92.68% <0.00%> (-1.77%) ⬇️
doctr/utils/geometry.py 100.00% <0.00%> (ø)
doctr/models/__init__.py 100.00% <0.00%> (ø)
doctr/models/recognition/zoo.py 100.00% <0.00%> (ø)
doctr/models/recognition/__init__.py 100.00% <0.00%> (ø)
doctr/transforms/modules/__init__.py 100.00% <0.00%> (ø)
doctr/transforms/functional/__init__.py 100.00% <0.00%> (ø)
doctr/transforms/functional/tensorflow.py 100.00% <0.00%> (ø)
doctr/models/recognition/sar.py
... and 34 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5e16bf4...98c7efd. Read the comment docs.

Copy link
Contributor

@fg-mindee fg-mindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost good to merge! A few adjustments and the rest looks good to me!

test/test_models.py Outdated Show resolved Hide resolved
doctr/models/_utils.py Outdated Show resolved Hide resolved
doctr/models/_utils.py Show resolved Hide resolved
@Rob192
Copy link
Contributor Author

Rob192 commented Jul 1, 2021

Great !

@fg-mindee fg-mindee added this to the 0.3.1 milestone Jul 1, 2021
Copy link
Contributor

@fg-mindee fg-mindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the edits!

@fg-mindee
Copy link
Contributor

Hello @fg-mindee
Thank you for your detailled review. I split the PRs into three different one.
Could you please point me to a dataset for benchmarking the changes for the image rotation ? Or even better, to an evaluation script that you would use for the release of an update.
Thanks :)

The repo already has an evaluation script in scripts/evaluate.py with support of several public datasets! But the tricky part is that those aren't rotated, so we'll have to add some test-time augmentation to check this

@fg-mindee fg-mindee merged commit aaebbe9 into mindee:main Jul 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: models Related to doctr.models type: enhancement Improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants