Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results Convert to HTML format #6

Open
yazheng0307 opened this issue Jul 3, 2024 · 4 comments
Open

Results Convert to HTML format #6

yazheng0307 opened this issue Jul 3, 2024 · 4 comments

Comments

@yazheng0307
Copy link

Can you convert the result to HTML?

@whn09
Copy link
Owner

whn09 commented Jul 3, 2024

I added cells_to_html to the notebook, and can you test it?

@yazheng0307
Copy link
Author

cells_to_html 可以用的,另外我试了几个表格,貌似识别的效果不是很好,
这是原图经过模型裁剪下来的结果:
test.zip

这是识别结果渲染的图像:
vis.zip

第一行的最后一行经常会漏掉,另外对合并单元格的表格识别效果不理想,是我测试的问题吗?

@whn09
Copy link
Owner

whn09 commented Jul 4, 2024

我看了下,修改pad会解决ca_table_118.png漏掉最后一行的问题;

x1 = max(0, int((min_x-w/2)*width)-20) # TODO expand 10px
y1 = max(0, int((min_y-h/2)*height)-20) # TODO expand 10px
x2 = min(width, int((min_x+w/2)*width)+20) # TODO expand 10px
y2 = min(height, int((min_y+h/2)*height)+20) # TODO expand 10px

另一张图片有点特殊,原图边框是虚线的,并且部分边缘连虚线都没有,之前训练数据中这类数据比较少,导致没有识别出【账面金额】和【坏账准备】这两个cell,这倒不是合并单元格的问题,主要是表格边缘的问题。

@yazheng0307
Copy link
Author

谢谢你的耐心解答,我再研究研究,有问题再交流(^▽^)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants