Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document.entities field is unusable when using data from Classifier output #332

Closed
evekhm opened this issue Jul 10, 2024 · 2 comments · Fixed by #333
Closed

Document.entities field is unusable when using data from Classifier output #332

evekhm opened this issue Jul 10, 2024 · 2 comments · Fixed by #333
Assignees

Comments

@evekhm
Copy link

evekhm commented Jul 10, 2024

Hello,

The wrapped_document, when using document.from_batch_process_metadata (or any other methods) will be missing entities field when using data from the Classifier.

When using output of splitter, everything works fine.
But with classifier - you wont get any important information like type and confidence.

from google.cloud.documentai_toolbox import document
import os

doc = document.Document.from_document_path(os.path.join(os.path.dirname(__file__), "output-document_split.json"))
print(doc.entities)
doc = document.Document.from_document_path(os.path.join(os.path.dirname(__file__), "output-document_classify.json"))
print(doc.entities)

output-document_split.json
output-document_classify.json

@evekhm
Copy link
Author

evekhm commented Jul 10, 2024

I do see that information is there inside shards.entities, but entities itself is totally broken/missing/unusable

@evekhm
Copy link
Author

evekhm commented Jul 11, 2024

Looking further at the issue:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/dataclasses.py", line 405, in wrapper
    result = user_function(self)
  File "<string>", line 3, in __repr__
AttributeError: 'Entity' object has no attribute 'start_page'

Both start_page and end_page need to be made Optional (since this info is not provided by the Classifier)

@holtskinner holtskinner changed the title Document.enities field is unusable when using data from Classifier output Document.entities field is unusable when using data from Classifier output Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants