analyze document API doesn't fetch accurate table information.

0

I am trying to extract and parse an invoice pdf that has tabular data, using Python. The table has a few columns that have rowspan of 2. Textract is unable to fetch the text for such rows. In the response, It shows rowspan as 1. So if the column has header as "Description of items", it is fetching "Description of" as one row and "items" as one row.

Sample pdf input

Output is below.

Table[0][2] = Designer Table[1][2] = Code

Table[0][1] = of Table[1][1] = Description Goods

Anyone has faced this issue or solved it, please suggest a solution.

Regards

feita há um ano288 visualizações
1 Resposta

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas