How do I troubleshoot SageMaker Ground Truth labeling errors?

3 minute read
0

I receive labeling errors when I use Amazon SageMaker Ground Truth, or my Amazon SageMaker workers are idle or not showing tasks. I want to troubleshoot these issues.

Resolution

Troubleshoot labeling errors

To troubleshoot labeling errors, check your permissions, output manifest file, and input manifest file.

Permissions

Be sure that you have permission to perform the following actions:

  • Create a labeling job.
  • Access input data.
  • Access the Amazon Simple Storage Service (Amazon S3) bucket that stores the output data.

Confirm that the Amazon S3 bucket is in the same AWS Region as the Ground Truth labeling job. Check that the bucket has a cross-origin resource sharing (CORS) policy attached. For more information, see CORS permission requirement.

For more information about permissions, see Step 1: Before you begin.

Output manifest file

In the output manifest file that you specified in the S3 bucket, check the metadata for failed annotations that led to failed labeling jobs:

{"source-ref":"s3://sagemaker-output-labeling-bucket-example/example.jpeg","example-metadata":{"retry-count":1,"failure-reason":"ClientError: Annotation tasks expired. Probable Reasons are 1) TaskAvailabilityLifetimeInSeconds parameter is too small. 2) Reward is too low for workers to work on the task. 3) If you use a custom html template, your template may be broken. 4) Data (image/video/text) sent for annotation is broken or too big, preventing completion. 5) All workers declined the tasks.","human-annotated":"true"}}

Workers are allowed to decline tasks because of unclear instructions, input data that's incorrectly displayed, or other task issues. If all workers decline, then the object is marked as expired and not sent to other workers. Set up an Amazon CloudWatch Events rule to monitor whether workers decline, submit, or return a task. 

Input manifest file

Be sure that the input manifest file meets all the listed JSON object requirements. For more information, see Use an input manifest file.

Troubleshoot task latency and idle workers

Set MaxConcurrentTaskCount to a size that allows workers to complete the entire batch within the specified TaskAvailabilityLifetimeInSeconds. The maximum value for this parameter is 1,000.

Set NumberOfHumanWorkersPerDataObject to a value that's appropriate for your use case.

For example, if you set the number to three workers for each object to label, then three workers must label each object. If two of the workers finish the current batch, then the third worker must finish their batch before the next batch is assigned. If a job disappears from the portal, then there might be an idle worker that's waiting for a new batch to be available.

Set TaskAvailabilityLifetimeInSeconds to a value that's appropriate for your use case. This value represents the total time that the tasks are available to the workers. The maximum value for this parameter is 864,000 seconds (10 days).

It's a best practice to split your input dataset into multiple jobs. Use the following conditions to direct them to the same work team:

  • The number of objects in the labeling job is high.
  • Your job failed because the wait time exceeded the TaskAvailabilityLifetimeInSeconds value.
  • Set TaskTimeLimitInSeconds to control the time workers take to complete a task so that tasks are annotated and the next batch is assigned.

Related information

Create a labeling job (API)

Control the flow of data objects sent to workers

AWS OFFICIAL
AWS OFFICIALUpdated a month ago