Error in table detection with AWS textract

0

Since a couple weeks, AWS textract weirdly merges rows in tables, making output incorrect. I am not sure what changed but I can send an example of invoice

nzer94
已提问 2 年前251 查看次数
3 回答
0

Hi Nzer94, A new model for the Table extraction has been released few months ago. We have added in this model the possibility to detect merged cells. If you ignore the MERGED_CELL block types, you should be able to get a similar output as before. The new model should as well increase the accuracy of the detection. If you see any issue in the new model, please reach out to your customer representative who will be able to share with us more detailed information about your issue, as we prefer not to share your confidential data in this forum. For more detailed on the new model release you can refer to this forum post:

AWS
已回答 2 年前
0

Hello, ignoring MERGED CELLS does not help, the new model detects table rows in the middle of text actually. Where can I contact my representative?

nzer94
已回答 2 年前
0

Before this model update the model was not detecting Merged cells at all, so it was detecting cells even in the middle of a text if the text was span on a merge cell. If you have this case in your documents, I would highly encourage you to use the Merged Cells functionality.

In a case like this one (sorry about the formatting but md doesn't support merge cell) :

| A | B |

| C D E |

You will get 4 cells and 1 merged cells :

  • Cells (1,1), (1,2), (2,1) and (2,2)
  • Merge Cell : (2, 1, colSpan = 2) with 2 children cells : (2,1) and (2, 2)

If you use only the cells, you will have indeed "CDE" split across the 2 cells.

But if you use the merged cells, then you can ignore the 2 children cells (2,1) and (2,2) and the text would be attached fully to the merged cell.

I hope it will help, if you have more specific questions you can fill a case with examples of document in the support center of your AWS account located under the "?" icon (this is how you will contact the customer support).

AWS
已回答 2 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则