PDFs with columns and PNGs with text oriented at different angles

0

Hi, For PDFs with columns, such as this: screenshot of german language pdf with columns The Textract result reads the text left to right across the page. Are there any workarounds for this?

For PNGs with text oriented at different angles, such as this: screenshot of german spatial plan with text oriented at different angles The Textract results picks up some information, but not all: screenshot of textract results from german spatial plan with text oriented at different angles Are there any workarounds for this?

已提问 1 年前221 查看次数
1 回答
0

For multi-column documents, I'd suggest trying getLineClustersInReadingOrder() from the JavaScript/TypeScript version of the Amazon Textract Response parser (reading-order feature doc & code here on GitHub, package on NPM). It's only an approximate/rule-based heuristic though, so might struggle a bit with the warping in your example or need tuning the parameters a little.

For images like your second sample that are a bit more visually busy and light on text, it might be worth trying whether Rekognition DetectText can perform any better?

AWS
专家
Alex_T
已回答 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则