textract analyze_document API call with Queries in Python

0

Hi, I'm trying to analyze a traffic ticket using textract in Python. Because of the complex nature of the document, I was planning to use the Queries feature as part of the call to textract. However, the response never seems to contain anything to do with the queries.

if I use the textract console and upload the same document, define the same queries that I'm using as an example, VIOLATION OF TRAFFIC CONTROL DEVICE, CRASH, COMMERCIAL VEHICLE, STREET, SECTION then it works perfectly. So, I know textract has the power and capability to do what I need, but I can't get it working in the API.

Here is my code. It runs fine without errors, but response comes back with Blocks containing BlockTypes that are all LINE or PAGE. No Queries.

textract = boto3.client('textract')

with open(file_name, 'rb') as file:
    bytes_test = file.read()

# Call Textract AnalyzeDocument by passing a document from local disk
response = None
response = textract.analyze_document(
    Document={'Bytes': bytes_test},
    FeatureTypes=['QUERIES'],
    QueriesConfig={
        "Queries": [
            {"Text": "VIOLATION OF TRAFFIC CONTROL DEVICE", "Alias": "VIOLATION OF TRAFFIC CONTROL DEVICE"},
            {"Text": "CRASH", "Alias": "CRASH"},
            {"Text": "COMMERCIAL VEHICLE", "Alias": "COMMERCIAL VEHICLE"},
            {"Text": "STREET", "Alias": "STREET"},
            {"Text": "SECTION", "Alias": "SECTION"}
        ]
    }
)
  • Could you please give some details about your boto3 and python versions. I've tried to reproduce your issue with python 3.11.3 and boto3 1.26.117.

    In my case the response contains a corresponding "BlockType": "QUERY" in any case, whether textract was able to answer the query or not. not answered: { "BlockType": "QUERY", "Id": "...", "Query": { "Text": "What is the speed limit?", "Alias": "SPEED LIMIT" } } answered { "BlockType": "QUERY_RESULT", "Confidence": 63.0, "Text": "35", ... }

  • Hi Norman, I'm using boto3 version 1.26.106 Python 3.11.3 PyCharm 2022.3.3 (Professional Edition) Build #PY-223.8836.43, built on March 10, 2023

    That's very odd. My response contains no BlockType's fof type QUERY. Can you share the same code you used to generate a response. I'll try that, and see what output I get from the exact same code.

  • Hi Gregory, I've downgraded to boto3 1.26.106 - same results on my side. I used your code, with a slight modification to the query as shown below:

    response = None response = textract.analyze_document( Document={"Bytes": bytes_test}, FeatureTypes=["QUERIES"], QueriesConfig={ "Queries": [ {"Text": "What is the speed limit?", "Alias": "SPEED LIMIT"}, ] }, )

    print(json.dumps(response["Blocks"], indent=4))

    Sample image: https://openverse.org/image/4cddc497-0104-4725-8837-36ca8e1112ae?q=roadsign%20speedlimit

asked a year ago900 views
2 Answers
0

Hi, I've no idea what I changed but it seems to be working now. Thank you so much for your help

answered a year ago
  • You are welcome! Great to hear, that it is working now!

0

Hi, thanks for using Textract service. I'm sorry that you can't get it work with API. Technically Console and API should return same results. Your code looks correct to me, and as @Norman said, it may related to version. Just a follow up, are you still facing the issue? If yes, do you mind share an example image you are using so I can try to reproduce? Thanks!

AWS
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions