PDFs with columns and PNGs with text oriented at different angles

0

Hi, For PDFs with columns, such as this: screenshot of german language pdf with columns The Textract result reads the text left to right across the page. Are there any workarounds for this?

For PNGs with text oriented at different angles, such as this: screenshot of german spatial plan with text oriented at different angles The Textract results picks up some information, but not all: screenshot of textract results from german spatial plan with text oriented at different angles Are there any workarounds for this?

asked a year ago210 views
1 Answer
0

For multi-column documents, I'd suggest trying getLineClustersInReadingOrder() from the JavaScript/TypeScript version of the Amazon Textract Response parser (reading-order feature doc & code here on GitHub, package on NPM). It's only an approximate/rule-based heuristic though, so might struggle a bit with the warping in your example or need tuning the parameters a little.

For images like your second sample that are a bit more visually busy and light on text, it might be worth trying whether Rekognition DetectText can perform any better?

AWS
EXPERT
Alex_T
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions