aws textract human review flow, failed to load image

0

Hi,

I`m using aws textract to extract key-value pairs from an pdf. Because sometimes the accucary is low i use augmented AI (human review worflows) to involve a human worker. That works fine with png files, but when I use pdf files (which textract supports), I get an "Failed to load image". How do I get around this? I tried using a custom template, but can't find a way to insert the file type.

Best regards,

Paul

asked 2 years ago551 views
2 Answers
2
Accepted Answer

The underlying challenge here is that, while modern browsers can natively render PDFs, they require different embedding methods for PDFs vs images. To my knowledge there's no built-in SageMaker Crowd HTML Element that's capable of handling both types interchangeably - and your experience with the pre-built UI seems to confirm this.

Displaying PDFs in A2I/SMGT

This simple sample suggests to use an <iframe type="application/pdf"> to display PDFs via the browser's native renderer. You could try this approach... but as of ~March 2022, I found support was patchy because some browsers' default security policies didn't like loading a cross-origin iframe with interactive content.

If relying on the browser native renderer won't work for your users, you can use the open-source PDF.js renderer instead. Here is a more complex sample template that does that. PDF.js is powerful, but can be pretty tricky to get started with from my experience... Note that the basic process in this sample is:

  • Tag the <script>s and stylesheets for PDF.js in from a CDN
  • Include a PDF viewer structure in your HTML
  • Pass your A2I object URL in through JavaScript and set up your viewer there - including any interactivity you need
  • (The second inline script tag there you can probably ignore: It's specific to what that data that template collects)

Scaling template complexity

Although the situation has improved a lot in recent years, writing direct-to-browser inline JavaScript in HTML can be tricky due to browser diversity and developer tooling limitations. If you want to build more advanced, interactive task templates, you might want to explore using front-end frameworks like React/Angular/Vue within A2I/Ground Truth.

The above-mentioned PDF.js template is actually a legacy that's since been replaced by this VueJS app in the sample that uses it. In that case, the switch was made because we wanted to customize the PDF viewer (rendering detection boxes over the document), and the complexity of the app justified setting up a proper toolchain. You can find discussion there about using frameworks in general and VueJS in particular with A2I, and could use the app as a starting point for building your own complex template in advance. Note if I was re-building that from scratch, I'd probably use much less liquid templating, and implement more within the JS framework itself as discussed here.

You can see the complex template being built/deployed from (SageMaker) Python notebook here, and a screenshot of it in action here. This end-to-end sample is discussed further in an AWS ML blog post.

Handling mixed PDF/Image content

If you need your template to handle both PDFs and images, this will add extra complexity. Could your JavaScript infer from the object URL (filename) which category the input object falls into, and dynamically set up either an <img> tag or a PDF viewer? Could you fetch the object from JS and check the Content-Type response header? Might it be simpler to add the file type as an input to your A2I loop, and pass it in that way? (e.g. using conditional liquid template to either render an <img> or not?)

Depending on what points in the flow you know the file type, there are multiple different ways you might tackle this. Ultimately though, you'll probably be switching between either generating an img or a PDF viewer: Whether those HTML elements are created by static Liquid templating or by dynamic JS.

AWS
EXPERT
Alex_T
answered 2 years ago
profile picture
EXPERT
reviewed a month ago
  • Hi Alex_T,

    thank you for your long and comprehensive answer! I also found the AnalyzeDocument type="application/pdf"></iframe> and tried to integrate it, but get an error ""grant_read_access" input is not a valid S3 URI: "". Also I think there might be a a problem if it would work, because I have to render the bountig boxes for the key´s. I will try to convert all my pdf´s to images and than work with it, maybe this will help.

    Thank you very much!

0

Hi Paul, have you tried this pdf using normal AnalyzeDocument API? And also could you check if the IAM role for your human review workflow has sufficient permission to access your s3 bucket?

AWS
Wenzhu
answered 2 years ago
  • Hi Wenzhu, yes I´m using the AnalyzeDocument API and the IAM-Role has sufficient permission.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions