I was testing out the latency of Textract + HITL via Mechanical Turk.
I created a flow defintion, a human loop config and then submitted a request on a document using:
humanLoopConfig = {
'FlowDefinitionArn':flowDefinitionArn,
'HumanLoopName':human_loop_unique_id,
'DataAttributes': { 'ContentClassifiers': [ 'FreeOfPersonallyIdentifiableInformation' ]}
}
def analyze_document_with_a2i(document_name, bucket):
response = textract.analyze_document(
Document={'S3Object': {'Bucket': bucket, 'Name': document_name}},
FeatureTypes=["TABLES", "FORMS"],
HumanLoopConfig=humanLoopConfig
)
return response
While the response was fine, when I list all the human loops related to a flow definition, I get this response:
{'HumanLoopName': 'human-loop-id-here', 'HumanLoopStatus': 'Failed', 'CreationTime': datetime.datetime(2023, 6, 20, 8, 29, 35, 606000, tzinfo=tzlocal()), 'FailureReason': 'ValidationError', 'FlowDefinitionArn': 'flow-definition-arn-here'}
How do I figure out what the exact issue is?
[EDITED]
FLOW DEFINITION FUNCTION:
def create_flow_definition(flow_definition_name):
'''
Creates a Flow Definition resource
Returns:
struct: FlowDefinitionArn
'''
# Visit https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-human-fallback-conditions-json-schema.html for more information on this schema.
humanLoopActivationConditions = json.dumps(
{
"Conditions": [
{
"Or": [
{
"ConditionType": "ImportantFormKeyConfidenceCheck",
"ConditionParameters": {
"ImportantFormKey": "Total",
"ImportantFormKeyAliases": ["Total Amount", "Amount Payable", "Balance Due"],
"KeyValueBlockConfidenceLessThan": 100,
"WordBlockConfidenceLessThan": 100
}
},
{
"ConditionType": "MissingImportantFormKey",
"ConditionParameters": {
"ImportantFormKey": "Vendor Name",
"ImportantFormKeyAliases": ["Vendor Name", "Remit to:", "Remit payment to"]
}
},
{
"ConditionType": "ImportantFormKeyConfidenceCheck",
"ConditionParameters": {
"ImportantFormKey": "Phone Number",
"ImportantFormKeyAliases": ["Phone number:", "Phone No.:", "Number:"],
"KeyValueBlockConfidenceLessThan": 100,
"WordBlockConfidenceLessThan": 100
}
},
{
"ConditionType": "ImportantFormKeyConfidenceCheck",
"ConditionParameters": {
"ImportantFormKey": "*",
"KeyValueBlockConfidenceLessThan": 100,
"WordBlockConfidenceLessThan": 100
}
},
{
"ConditionType": "ImportantFormKeyConfidenceCheck",
"ConditionParameters": {
"ImportantFormKey": "*",
"KeyValueBlockConfidenceGreaterThan": 0,
"WordBlockConfidenceGreaterThan": 0
}
}
]
}
]
}
)
response = sagemaker.create_flow_definition(
FlowDefinitionName= flow_definition_name,
RoleArn= ROLE,
HumanLoopConfig= {
"WorkteamArn": WORKTEAM_ARN,
"PublicWorkforceTaskPrice": {
"AmountInUsd": {
"Cents": 1,
"Dollars": 0,
"TenthFractionsOfACent": 2
}
},
"HumanTaskUiArn": humanTaskUiArn,
"TaskCount": 1,
"TaskDescription": "Document analysis sample task description",
"TaskTitle": "Document analysis sample task"
},
HumanLoopRequestSource={
"AwsManagedHumanLoopRequestSource": "AWS/Textract/AnalyzeDocument/Forms/V1"
},
HumanLoopActivationConfig={
"HumanLoopActivationConditionsConfig": {
"HumanLoopActivationConditions": humanLoopActivationConditions
}
},
OutputConfig={
"S3OutputPath" : OUTPUT_PATH
}
)
return response['FlowDefinitionArn']
Hi, sure! Since comments have a 600 character limit, I edited my question and added the function I used to create a flow definition. Hope this helps!
But, if there was an issue with the flow definition configuration, why was it created successfully and active? Any idea?
Any updates here? Can anyone point to the right direction?