Error in table detection with AWS textract

0

Since a couple weeks, AWS textract weirdly merges rows in tables, making output incorrect. I am not sure what changed but I can send an example of invoice

nzer94
preguntada hace 2 años251 visualizaciones
3 Respuestas
0

Hi Nzer94, A new model for the Table extraction has been released few months ago. We have added in this model the possibility to detect merged cells. If you ignore the MERGED_CELL block types, you should be able to get a similar output as before. The new model should as well increase the accuracy of the detection. If you see any issue in the new model, please reach out to your customer representative who will be able to share with us more detailed information about your issue, as we prefer not to share your confidential data in this forum. For more detailed on the new model release you can refer to this forum post:

AWS
respondido hace 2 años
0

Hello, ignoring MERGED CELLS does not help, the new model detects table rows in the middle of text actually. Where can I contact my representative?

nzer94
respondido hace 2 años
0

Before this model update the model was not detecting Merged cells at all, so it was detecting cells even in the middle of a text if the text was span on a merge cell. If you have this case in your documents, I would highly encourage you to use the Merged Cells functionality.

In a case like this one (sorry about the formatting but md doesn't support merge cell) :

| A | B |

| C D E |

You will get 4 cells and 1 merged cells :

  • Cells (1,1), (1,2), (2,1) and (2,2)
  • Merge Cell : (2, 1, colSpan = 2) with 2 children cells : (2,1) and (2, 2)

If you use only the cells, you will have indeed "CDE" split across the 2 cells.

But if you use the merged cells, then you can ignore the 2 children cells (2,1) and (2,2) and the text would be attached fully to the merged cell.

I hope it will help, if you have more specific questions you can fill a case with examples of document in the support center of your AWS account located under the "?" icon (this is how you will contact the customer support).

AWS
respondido hace 2 años

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas