Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle multi page invoices #279

Open
DNFSiF opened this issue Jan 6, 2024 · 4 comments
Open

How to handle multi page invoices #279

DNFSiF opened this issue Jan 6, 2024 · 4 comments

Comments

@DNFSiF
Copy link

DNFSiF commented Jan 6, 2024

I try to use donut with a custom invoice dataset to get fields like for example invoice numbers and totals.
The invoices can be single paged or multi paged, so the fields could be across different pages.

Has anyone experience with multi page invoices?
Should I merge the pages together to a single image?
Do I train different models for different page counts?

Thanks for any advise! 😄

@felixvor
Copy link

felixvor commented Feb 2, 2024

You could think about increasing the input dimensions and forwarding multiple pages as one image, but it does not scale well and no hardware can realistically handle that compute with more than a few pages. What we did was try to find the values we want to label to a page using fuzzy matches in OCR (for example using libraries like rapidfuzz). If we find the label as a substring on a pages OCR, we label that page for the donut training. Maybe that helps you, good luck!

@balajiChundi
Copy link

"Sending in multiple pages for each request", if you define your use case like this - model's max_positional_embeddings (you might have to parameter tune) might not be sufficient to incorporate all the info in a single response and higher possibilities of repetition of text. Instead, you can build a single page prediction model at a time and handle the predictions later.

@xdevfaheem
Copy link

@balajiChundi can you elaborate a bit what you mean?

@balajiChundi
Copy link

First and preferred way: Get the predictions from the model twice, once per each page (for a two page invoice), you can parallelize the model predictions for a faster output. PS: This worked for me.
Second : (Didn't work for me), I concatenated the images like stitching them vertically, trained the model. The problem with this is, data prep is very clumsy and time-taking and cannot actually decide on the max_token that we get as output, So this is not at all recommended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants