analyze document API doesn't fetch accurate table information.

0

I am trying to extract and parse an invoice pdf that has tabular data, using Python. The table has a few columns that have rowspan of 2. Textract is unable to fetch the text for such rows. In the response, It shows rowspan as 1. So if the column has header as "Description of items", it is fetching "Description of" as one row and "items" as one row.

Sample pdf input

Output is below.

Table[0][2] = Designer Table[1][2] = Code

Table[0][1] = of Table[1][1] = Description Goods

Anyone has faced this issue or solved it, please suggest a solution.

Regards

demandé il y a un an288 vues
1 réponse

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions