Textraxt + HITL request: HumanInTheLoop Mechanical Turk Job fails with error: Validation error

0

I was testing out the latency of Textract + HITL via Mechanical Turk. I created a flow defintion, a human loop config and then submitted a request on a document using:

humanLoopConfig = {
    'FlowDefinitionArn':flowDefinitionArn,
    'HumanLoopName':human_loop_unique_id,
    'DataAttributes': { 'ContentClassifiers': [ 'FreeOfPersonallyIdentifiableInformation' ]}
}

def analyze_document_with_a2i(document_name, bucket):
    response = textract.analyze_document(
        Document={'S3Object': {'Bucket': bucket, 'Name': document_name}},
        FeatureTypes=["TABLES", "FORMS"],
        HumanLoopConfig=humanLoopConfig
    )
    return response

While the response was fine, when I list all the human loops related to a flow definition, I get this response:

{'HumanLoopName': 'human-loop-id-here', 'HumanLoopStatus': 'Failed', 'CreationTime': datetime.datetime(2023, 6, 20, 8, 29, 35, 606000, tzinfo=tzlocal()), 'FailureReason': 'ValidationError', 'FlowDefinitionArn': 'flow-definition-arn-here'}

How do I figure out what the exact issue is?

[EDITED]

FLOW DEFINITION FUNCTION:

def create_flow_definition(flow_definition_name):
    '''
    Creates a Flow Definition resource

    Returns:
    struct: FlowDefinitionArn
    '''
    # Visit https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-human-fallback-conditions-json-schema.html for more information on this schema.
    humanLoopActivationConditions = json.dumps(
        {
            "Conditions": [
                {
                  "Or": [

                    {
                        "ConditionType": "ImportantFormKeyConfidenceCheck",
                        "ConditionParameters": {
                            "ImportantFormKey": "Total",
                            "ImportantFormKeyAliases": ["Total Amount", "Amount Payable", "Balance Due"],
                            "KeyValueBlockConfidenceLessThan": 100,
                            "WordBlockConfidenceLessThan": 100
                        }
                    },
                    {
                        "ConditionType": "MissingImportantFormKey",
                        "ConditionParameters": {
                            "ImportantFormKey": "Vendor Name",
                            "ImportantFormKeyAliases": ["Vendor Name", "Remit to:", "Remit payment to"]
                        }
                    },
                    {
                        "ConditionType": "ImportantFormKeyConfidenceCheck",
                        "ConditionParameters": {
                            "ImportantFormKey": "Phone Number",
                            "ImportantFormKeyAliases": ["Phone number:", "Phone No.:", "Number:"],
                            "KeyValueBlockConfidenceLessThan": 100,
                            "WordBlockConfidenceLessThan": 100
                        }
                    },
                    {
                      "ConditionType": "ImportantFormKeyConfidenceCheck",
                      "ConditionParameters": {
                        "ImportantFormKey": "*",
                        "KeyValueBlockConfidenceLessThan": 100,
                        "WordBlockConfidenceLessThan": 100
                      }
                    },
                    {
                      "ConditionType": "ImportantFormKeyConfidenceCheck",
                      "ConditionParameters": {
                        "ImportantFormKey": "*",
                        "KeyValueBlockConfidenceGreaterThan": 0,
                        "WordBlockConfidenceGreaterThan": 0
                      }
                    }
            ]
        }
            ]
        }
    )

    response = sagemaker.create_flow_definition(
            FlowDefinitionName= flow_definition_name,
            RoleArn= ROLE,
            HumanLoopConfig= {
                "WorkteamArn": WORKTEAM_ARN,
                "PublicWorkforceTaskPrice": {
                  "AmountInUsd": {
                      "Cents": 1,
                      "Dollars": 0,
                      "TenthFractionsOfACent": 2
                  }
                },
                "HumanTaskUiArn": humanTaskUiArn,
                "TaskCount": 1,
                "TaskDescription": "Document analysis sample task description",
                "TaskTitle": "Document analysis sample task"
            },
            HumanLoopRequestSource={
                "AwsManagedHumanLoopRequestSource": "AWS/Textract/AnalyzeDocument/Forms/V1"
            },
            HumanLoopActivationConfig={
                "HumanLoopActivationConditionsConfig": {
                    "HumanLoopActivationConditions": humanLoopActivationConditions
                }
            },
            OutputConfig={
                "S3OutputPath" : OUTPUT_PATH
            }
        )

    return response['FlowDefinitionArn']
Nirali
asked 10 months ago242 views
1 Answer
0

Hi It may because of the flow definition configuration. Could you provide more details here so we can reproduce the issue?

AWS
answered 10 months ago
  • Hi, sure! Since comments have a 600 character limit, I edited my question and added the function I used to create a flow definition. Hope this helps!

    But, if there was an issue with the flow definition configuration, why was it created successfully and active? Any idea?

  • Any updates here? Can anyone point to the right direction?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions