Comprehend/Groundtruth PDF labeling job creation fails due to folder structure for manifest file (?)

0

I'm following the official tutorial for the creation of a PDF labeling/annotation job: https://docs.aws.amazon.com/comprehend/latest/dg/cer-annotation-pdf.html I checked all the created roles, buckets, lambdas, etc for permissions and everything seems OK But after I run the script in the "Creating an annotation job" section, I get this error in the labeling job after it is created:

"401 UnknownError: The specified key output/apsis-labeling-hope-labeling-job-20241016T022122/apsis-johnny-ra/manifests/output/output.manifest isn't present in the S3 bucket comprehend-semi-structured-docs-us-east-1-575108929836."

"Status Failed Reason for failure ClientError: Exception invoking the Lambda function arn:aws:lambda:us-east-1:575108929836:function:sam-app-GTPreHumanTaskLambdaFunction-ZkgqFDG1mYph. LambdaErrorCode: AccessDeniedException. Ensure the Lambda function exists, that the role arn:aws:iam::[...]:role/service-role/AmazonSageMaker-ExecutionRole-[...] has permissions to invoke it and try your request again."

I noticed that the folder structure the script is creating isn't the same as the expected for the manifest file, which may be the reason that the error says the manifest isn't present, inside the "comprehend-semi-structure-docs-us-east-1-[number] > output > folder > manifests/" directory apparently there was supposed to be an "outputs" folder in it with the manifest file inside, instead it's creating an "intermediate" folder with another folder called "1/" inside and in this folder the manifest is being created, I'm not sure why this folder structure is being created instead, I just followed the tutorial from the link

1 Answer
0

The output.manifest is usually the final consolidated output file from a SMGT labelling job, while the "PreHumanTaskLambda" function is usually a function called to transform an individual task datum from the input manifest, before it gets mapped into the Liquid HTML template for the task UI.

Therefore (although I'm not super deep on this Comprehend error in particular), I expect it's your second AccessDeniedException that's actually the root cause - and the missing output.manifest is just a reflection that the labelling job failed?

I'd suggest to double-check that your AmazonSageMaker-ExecutionRole-... has permissions to lambda:InvokeFunction on your given function (I see AmazonSageMakerFullAccess only grants it on a few name patterns including e.g. *SageMaker*, *LabelingFunction*) - and that your Lambda's resource-based policy doesn't have any restrictions preventing it being called by SageMaker/Comprehend (I'm not immediately sure which?)

Some other useful resources:

(intermediate outputs are typically created by SM Ground Truth during the job before later consolidation into the output manifest)

AWS
EXPERT
answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions