PDFs with columns and PNGs with text oriented at different angles

0

Hi, For PDFs with columns, such as this: screenshot of german language pdf with columns The Textract result reads the text left to right across the page. Are there any workarounds for this?

For PNGs with text oriented at different angles, such as this: screenshot of german spatial plan with text oriented at different angles The Textract results picks up some information, but not all: screenshot of textract results from german spatial plan with text oriented at different angles Are there any workarounds for this?

posta un anno fa221 visualizzazioni
1 Risposta
0

For multi-column documents, I'd suggest trying getLineClustersInReadingOrder() from the JavaScript/TypeScript version of the Amazon Textract Response parser (reading-order feature doc & code here on GitHub, package on NPM). It's only an approximate/rule-based heuristic though, so might struggle a bit with the warping in your example or need tuning the parameters a little.

For images like your second sample that are a bit more visually busy and light on text, it might be worth trying whether Rekognition DetectText can perform any better?

AWS
ESPERTO
Alex_T
con risposta un anno fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande