- Newest
- Most votes
- Most comments
Textract is a machine learning service so it may not be 100% accurate. However, we are always trying to improve the accuracy of our models. We will forward this particular use case to our science teams so that hopefully a future model update will not miss this case.
Regarding the issue with the reading order, Textract currently does not support configuration for reading order. However, we will surface this feature request to our PMs to see if this can be added to a future release.
Hi glad to hear Textract AnalyzeDoc's FORMS feature is working well for you generally.
Textract currently does not support configuration for reading order. For this particular case, it would be good if you share your document with us so we can investigate further as to why you're not getting the expected order.
Hi, thanks for the reply! I can share a snippet of a FORMS example. In the link below you can find the picture. After converting the PNG to PDF and running the code above, I get the following result:
LINK: https://gyazo.com/b7b949cd06538bf7931cb9f6117ac581
[['Key: Straße, Hausnummer Value: None',
'Key: Ggf. weitere Angaben Value: None',
'Key: Geburtsname Value: None',
'Key: Postleitzahl, Wohnort, bei Soldaten Standort Value: None',
'Key: Familienname Value: None',
'Key: Geburtsdatum Value: Geburtsort']]
Naturally, the order should have been:
[['Key: Vorname, Value: None',
'Key: Familienname Value: None',
'Key: Straße, Hausnummer Value: None',
'Key: Postleitzahl, Wohnort, bei Soldaten Standort Value: None',
'Key: Geburtsdatum Value: Geburtsort',
'Key: Geburtsname Value: None',
'Key: Ggf. weitere Angaben Value: None']]
So in this example, it even missed out on "Vorname" variable. Usually it don't miss... But if anyone can help me explain this behavior I would be very grateful!
Relevant content
- asked 2 years ago
- asked 2 years ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 8 months ago