A2I for Named Entity Recognition in PDFs

0

Hi! I developed my own custom Named Entity Recognizer in Comprehend by following these 2 posts on the blog:

https://aws.amazon.com/pt/blogs/machine-learning/custom-document-annotation-for-extracting-named-entities-in-documents-using-amazon-comprehend/

https://aws.amazon.com/pt/blogs/machine-learning/extract-custom-entities-from-documents-in-their-native-format-with-amazon-comprehend/

Now, I really wanted to make A2I work with this recognizer. My idea is simple: by checking the confidence scores of the entities recognized, the PDF document would be sent to A2I in case the threshold is below the specified, and then on the interface I would highlight the correct parts of the PDF to be considered the desired entity. It's a similar approach to this other post: https://aws.amazon.com/pt/blogs/machine-learning/setting-up-human-review-of-your-nlp-based-entity-recognition-models-with-amazon-sagemaker-ground-truth-amazon-comprehend-and-amazon-a2i/, the difference being that in my workflow, the entire document should appear on the screen to be highlighted and corrected, instead of just text as it is in this post. Is this possible?

1 Answer
1

Hi

If you start a custom task, you can build your own HTML template.

The key is the way to build the template, you can also use a iframe and grant_read_access to generate a temporal signed URL to show the PDF from S3. for example: https://docs.aws.amazon.com/sagemaker/latest/dg/sms-ui-template-crowd-classifier.html

But I prefer to use a a js lib like pdf.js to draw the PDF.

Please also read this documentation https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-custom-templates.html

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions