analyze document API doesn't fetch accurate table information.

0

I am trying to extract and parse an invoice pdf that has tabular data, using Python. The table has a few columns that have rowspan of 2. Textract is unable to fetch the text for such rows. In the response, It shows rowspan as 1. So if the column has header as "Description of items", it is fetching "Description of" as one row and "items" as one row.

Sample pdf input

Output is below.

Table[0][2] = Designer Table[1][2] = Code

Table[0][1] = of Table[1][1] = Description Goods

Anyone has faced this issue or solved it, please suggest a solution.

Regards

preguntada hace un año288 visualizaciones
1 Respuesta

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas