Amazon Textract Queries : Trouble when query for empty value

0

Hello everyone,

I try to extract data from financial documents with the feature QUERIES given by Amazon Textract. The documents that I need to analyse have the format bellow : Enter image description here

When I try to query for a value and this value can be empty (ex : What is the value of AA ?). When the value is empty, the answer given to my query is a value corresponding to an other label that is not empty (ex : the value of AF (="12345") instead of "" ). Anybody does know how can I get the correct value or at least how can I be able to know if a value is empty ?

Thank you in advance for your help.

Cherrygolo.

profile picture
asked a year ago285 views
1 Answer
1

Textract Query may not be the best fit here given the structure of the document. You'll end up with inconsistent results. Here are the results of my tests so far with Query:

  • What is the value of AB? --> <empty>
  • What is the value of AF--> 12345
  • What is the value of AA --> Capital souscrit non appelé
  • What is the value of Capital souscrit non appelé? --> LLES

Once approach could be to leverage Tables and the merged cell feature that identifies cells that are merged horizontally or vertically. The screenshot below shows what I was able to get while testing the sample in the demo console using Tables.

Demo results with Tables feature

Please check out the blog below for an example of how to use the merged cell construct in the AnalyzeDocument API's response. https://aws.amazon.com/blogs/machine-learning/merge-cells-and-column-headers-in-amazon-textract-tables/

AWS
NZ
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions