Error in table detection with AWS textract

0

Since a couple weeks, AWS textract weirdly merges rows in tables, making output incorrect. I am not sure what changed but I can send an example of invoice

nzer94
asked 2 years ago242 views
3 Answers
0

Hi Nzer94, A new model for the Table extraction has been released few months ago. We have added in this model the possibility to detect merged cells. If you ignore the MERGED_CELL block types, you should be able to get a similar output as before. The new model should as well increase the accuracy of the detection. If you see any issue in the new model, please reach out to your customer representative who will be able to share with us more detailed information about your issue, as we prefer not to share your confidential data in this forum. For more detailed on the new model release you can refer to this forum post:

AWS
answered 2 years ago
0

Hello, ignoring MERGED CELLS does not help, the new model detects table rows in the middle of text actually. Where can I contact my representative?

nzer94
answered 2 years ago
0

Before this model update the model was not detecting Merged cells at all, so it was detecting cells even in the middle of a text if the text was span on a merge cell. If you have this case in your documents, I would highly encourage you to use the Merged Cells functionality.

In a case like this one (sorry about the formatting but md doesn't support merge cell) :

| A | B |

| C D E |

You will get 4 cells and 1 merged cells :

  • Cells (1,1), (1,2), (2,1) and (2,2)
  • Merge Cell : (2, 1, colSpan = 2) with 2 children cells : (2,1) and (2, 2)

If you use only the cells, you will have indeed "CDE" split across the 2 cells.

But if you use the merged cells, then you can ignore the 2 children cells (2,1) and (2,2) and the text would be attached fully to the merged cell.

I hope it will help, if you have more specific questions you can fill a case with examples of document in the support center of your AWS account located under the "?" icon (this is how you will contact the customer support).

AWS
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions