PDFs with columns and PNGs with text oriented at different angles

0

Hi, For PDFs with columns, such as this: screenshot of german language pdf with columns The Textract result reads the text left to right across the page. Are there any workarounds for this?

For PNGs with text oriented at different angles, such as this: screenshot of german spatial plan with text oriented at different angles The Textract results picks up some information, but not all: screenshot of textract results from german spatial plan with text oriented at different angles Are there any workarounds for this?

demandé il y a un an221 vues
1 réponse
0

For multi-column documents, I'd suggest trying getLineClustersInReadingOrder() from the JavaScript/TypeScript version of the Amazon Textract Response parser (reading-order feature doc & code here on GitHub, package on NPM). It's only an approximate/rule-based heuristic though, so might struggle a bit with the warping in your example or need tuning the parameters a little.

For images like your second sample that are a bit more visually busy and light on text, it might be worth trying whether Rekognition DetectText can perform any better?

AWS
EXPERT
Alex_T
répondu il y a un an

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions