PDFs with columns and PNGs with text oriented at different angles

0

Hi, For PDFs with columns, such as this: screenshot of german language pdf with columns The Textract result reads the text left to right across the page. Are there any workarounds for this?

For PNGs with text oriented at different angles, such as this: screenshot of german spatial plan with text oriented at different angles The Textract results picks up some information, but not all: screenshot of textract results from german spatial plan with text oriented at different angles Are there any workarounds for this?

已提問 1 年前檢視次數 221 次
1 個回答
0

For multi-column documents, I'd suggest trying getLineClustersInReadingOrder() from the JavaScript/TypeScript version of the Amazon Textract Response parser (reading-order feature doc & code here on GitHub, package on NPM). It's only an approximate/rule-based heuristic though, so might struggle a bit with the warping in your example or need tuning the parameters a little.

For images like your second sample that are a bit more visually busy and light on text, it might be worth trying whether Rekognition DetectText can perform any better?

AWS
專家
Alex_T
已回答 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南