Skip to content

Textract Text Detection Orientation and Results Ordering

0

I'm using the async workflow and paging through the results of GetDocumentTextDetection and have a couple questions.

For context, I'm using these results to determine the "primary" orientation of the text on PDF pages so I can rotate the page to be "upright", and the orientation of the text so I can match it to the image. I'd be open to other ideas for achieving this, as well.

  1. Does Textract detect and return text with multiple orientations on the same page?

  2. Are GetDocumentTextDetection results in page order? They have been in the examples I've done, but I'd like to know if this is reliable.

  3. Are word block geometry polygon points always in the same order (clockwise from the top-left of the detected text orientation?

Update: it looks like this library essentially does what I was going to do, but I still need to know if I can safely assume that, when processing results, once I see a new page number, I've seen all the results for the page

Thanks!

asked 2 years ago1.2K views
1 Answer
1
Accepted Answer

Hi, I helped build the JavaScript/TypeScript version of the TRP library you mentioned. In general can say we think they're pretty helpful for simplifying post-processing logic on Textract results, and would encourage GitHub issues/feature requests if you see any shortcomings!

1/ Detecting multiple text orientations on the same page

Yes, but in my experience the accuracy is usually a bit worse than if all text on a page is similarly aligned... I believe this is a pretty common challenge for all OCR solutions.

2/ Relying in page ordering of results

Ideally you might follow the various trees of relationship (e.g. here) from a PAGE block to the types of content it contains... But yes, you should be able to rely on the sequencing that all of a page's content comes before the next PAGE block in the list. The old version of Python TRP depended on this. The JavaScript/TypeScript version of the library still depends on it.

For the sake of clarity it's worth pointing out that within a page the order of Textract WORD blocks might not naturally align with human reading order: Especially for e.g. multi-column documents. TRP provides some basic client-side heuristics to estimate reading order if you want, but the best performance should come from enabling the Layout analysis feature in Amazon Textract, and taking the returned order of layout blocks as your best-guess guide for reading order.

3/ Ordering of word polygon points

Yes, the word polygon point array should go from the top-left in a clockwise direction around the word - and this is what both the Python and JS/TS response parser libraries use to judge the "orientation" of a word.

AWS
EXPERT
answered 2 years ago
  • we think they're pretty helpful for simplifying post-processing logic on Textract results, and would encourage GitHub issues/feature requests if you see any shortcomings!

    I only gave the Python version a quick look, but my concern so far is that it looks like it's based on pages of Textract results, rather than document pages, though the examples seem to work on entire Textract result sets, which doesn't seem ideal for my use-case. I suppose I could pipe the Textract results though this, store them on disk, then process each file or something?

    Anyway, if I end up attempting to use this library I'll be sure to post if I run into anything!

    the accuracy is usually a bit worse than if all text on a page is similarly aligned

    Good to know. So far for my use-case it seems like multiple alignments on one page are rare, but I'll keep this in mind if we decide we need to address these better.

    Ideally you might follow the various trees of relationship... But yes, you should be able to rely on the sequencing that all of a page's content comes before the next PAGE block in the list

    Yeah, I agree that would be ideal/safest. I was hoping to avoid needing to buffer all Textract results in memory if I could avoid it, and it sounds like I can, which is great!

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.