I am trying to extract and parse an invoice pdf that has tabular data, using Python. The table has a few columns that have rowspan of 2. Textract is unable to fetch the text for such rows. In the response, It shows rowspan as 1. So if the column has header as "Description of items", it is fetching "Description of" as one row and "items" as one row.
Output is below.
Table[0][2] = Designer
Table[1][2] = Code
Table[0][1] = of
Table[1][1] = Description Goods
Anyone has faced this issue or solved it, please suggest a solution.
Regards