Issue with AWS Textract


Is it possible to split word with punctuation mark in block="word" response and also split its geometric values?

asked a year ago324 views
1 Answer

Amazon Textract can detect printed text and handwriting from the Standard English alphabet and ASCII symbols. Amazon Textract can extract printed text, forms and tables in English, German, French, Spanish, Italian and Portuguese. Amazon Textract also extracts explicitly labeled data, implied data, and line items from an itemized list of goods or services from almost any invoice or receipt in English without any templates or configuration. Amazon Textract can also extract specific or implied data such as names and addresses from identity documents in English such as U.S. passports and driver’s licenses without the need for templates or configuration. Finally, Amazon Textract can extract any specific data from documents without worrying about the structure or variations of the data in the document using Queries in English.

Since punctuation falls under ASCII symbols it can be detected. You can use bounding boxes to find the positions of the symbols.

profile pictureAWS
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions