Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions #1

Open
light42 opened this issue Nov 15, 2022 · 2 comments
Open

Some questions #1

light42 opened this issue Nov 15, 2022 · 2 comments

Comments

@light42
Copy link

light42 commented Nov 15, 2022

Before I ask questions, let me report what I found when I test the model you've trained.

  1. It can't handle "extremes", if the cell size is too large or too small or if a cell has multiple line of texts it would be hard to detect.
  2. It can't handle merged column/rows.
  3. It has some capability to recognize empty cells(need to test further).
  4. It could recognize tables with large number of cells(need to test further).
  5. Overall, if the table behaved nicely (no merged column/rows, adequate cell-size) it is quite accurate.

Questions:

  1. Is structure recognition result heavily depends on ocr results?
  2. How many PubTable samples you used?
  3. In your opinion, could this method be better than existing state-of-the-art tools (PaddleOCR)?

Overall I actually impressed with training result of your model, even if it's only small part of Pub1M it's still impressive that it's not overfitted. I've trained PaddleOCR for table recognition and somehow it always overfitted.

@whn09
Copy link
Owner

whn09 commented Nov 16, 2022

Hi,

Thank you for you feedback and questions. And I will explain my method here:

  • I used all PubTables-1M train data to train the yolov5s model, and use test data to evaluate the model
  • Convert VOC-PASCAL format to COCO format
  • Train detection/structure model using yolov5s (14.4M vs DETR 110M) for 10 epochs (size=640)
    • Detection model: mAP@0.5=0.995 (vs DETR 0.995)
    • Structure model: mAP@0.5=0.962 (vs DETR 0.971)
  • Merge OCR result with table detection and structure result using postprocess.objects_to_cells (implemented in table-transformer)

For your questions, I try to give some responses:

  1. Is structure recognition result heavily depends on ocr results?
    Yes, since PubTables-1M only provides rows and cols labeling, we need some post-processes to get the cell result, and in the original code (table-transformer), the post-processes need ocr result.

  2. How many PubTable samples you used?
    All train data to train the model, and val data to validate the model, and test data to evaluate the model. You can see the details here: https://github.com/whn09/table_structure_recognition/blob/main/yolov5/data/custom-detection.yaml

  3. In your opinion, could this method be better than existing state-of-the-art tools (PaddleOCR)?
    I think the model is better than PaddleOCR, and even table-transformer. The method is a commonly used method, and we can get good result using Yolov5s, and if you want to get better result, you can use yolov5m or larger models.

@light42
Copy link
Author

light42 commented Nov 24, 2022

I'm not tested it yet, but I think if you train yolo for text detection it will give great result, after that even EasyOCR/Tesseract could be used for text recognition. My colleague use yolov7 for detecting texts in official documents, and it worked great. You could use it to finally completed the pipeline.

You could use SynthTabNet dataset, for training since it contains bbox for each texts in the cells.
And maybe add some little noises using shabby-pages so that it could handle table images in imperfect conditions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants