Script to extract queries from JSON
We have a pipeline to extract queries from PDF documents, async. Here's a sample from teh JSON " block_type='QUERY_RESULT', relationships=None, confidence=77.0, text='1 NON-SENSITIVE', " This confirms that our AWS pipeline is working for us. However, now matter what combination of the sample AWS scripts we use, we get errors. Anybody out there have an idea on how to format the extraction script so that it will work as AWS intends?
Input In [29], in <cell line: 19>() 16 print('d =', d) 18 #get_query_answers ---> 19 query_answers = d.get_query_answers(page=page) 21 #for x in query_answers: 22 # print(f"{image_filename},{x[1]},{x[2]}") 24 print(tabulate(query_answers, tablefmt="github"))
File ~/anaconda3/envs/aws-local/lib/python3.9/site-packages/trp/trp2.py:569, in TDocument.get_query_answers(self, page) 567 if answers: 568 for answer in answers: --> 569 result_list.append([query.query.text, query.query.alias, answer.text]) 570 else: 571 result_list.append([query.query.text, query.query.alias, ""])
AttributeError: 'NoneType' object has no attribute 'text'
Could you please post the entire json response you are getting from Textract Queries API before it reaches the post-processing, extraction script? It would help in our debugging efforts. If you cannot post that publicly on a forum post, please open a customer support ticket with us and attach the image/pdf of concern, your entire code file, and the entire stack trace. Thanks.
Does this work for you? https://aws.amazon.com/blogs/machine-learning/specify-and-extract-information-from-documents-using-the-new-queries-feature-in-amazon-textract/
Relevant questions
Unable to extract fields from Analyze Expense Demo for pdfs. Am I missing anything?
asked 2 months agoScript to extract queries from JSON
asked 17 days agoHow to extract key value pairs from Textract with A2I JSON output??
asked 5 months agoHow to parse and extract key value pairs from the Textract response in PHP in JSON format
asked a month ago[Announcement] Amazon Textract adds synchronous support for single page PDF documents and support for PDF documents containing JPEG 2000 encoded images
asked 5 months agoHow to export environment variables from a Lambda function.
asked 2 months agoExtract key Value pair from a column value which is a Json structure
asked 3 years agoHow is Audio Identification transcription text created from JSON?
Accepted Answerasked 3 years agoTextract - How to extract just certain fields
asked 5 months agoSort and extract full text
asked 2 months ago