Annotation of PDF document using Bounding Box

0

Hi! Is it possible to develop some kind of template for SageMaker that uses bounding boxes to annotate a PDF document? I was looking for something similar to the crowd-bounding-box HTML tag, but instead of only capturing portions of the PDF's image, I also wanted to extract data regarding the text located inside that portion, along with it's coordinates and stuff like that, so I could give context to my annotation.

  • I believe it could be possible, but have you tried out Amazon Textract? It is purpose built for text extraction, and provides bounding box coordinates for all text and images. You can then process the response anyway you like to extract text, paragraphs, forms etc. You can get started here - https://aws.amazon.com/textract/.

질문됨 2년 전714회 조회
1개 답변
0

Your usecase sounds very close to the core Textract workflow - check out this AWS ML blog post that provides a solution that generates searchable PDF's, including bounding boxes for the text and such

https://aws.amazon.com/blogs/machine-learning/generating-searchable-pdfs-from-scanned-documents-automatically-with-amazon-textract/

AWS
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠