analyze document API doesn't fetch accurate table information.

0

I am trying to extract and parse an invoice pdf that has tabular data, using Python. The table has a few columns that have rowspan of 2. Textract is unable to fetch the text for such rows. In the response, It shows rowspan as 1. So if the column has header as "Description of items", it is fetching "Description of" as one row and "items" as one row.

Sample pdf input

Output is below.

Table[0][2] = Designer Table[1][2] = Code

Table[0][1] = of Table[1][1] = Description Goods

Anyone has faced this issue or solved it, please suggest a solution.

Regards

質問済み 1年前288ビュー
1回答

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ