How to avoid multiple “Completed” events from SageMaker batch transform job

0

I’m launching a SageMaker batch transform job from the AWS Batch service on a set of images that have been extracted from a video file as input to the Batch job. I have a CloudWatch Event rule that listens to SageMaker “TransformJobStatus” events for the “Completed” state and will trigger a lambda function that will call another AWS Batch job.

The problem is that I’m getting two “Completed” events from the same SageMaker transform job. To investigate this issue, I created a lambda function which listens to all "SageMaker Transform Job State Change” events and I’m seeing a bunch of “In Progress” events followed by two “Completed” events.

My questions are:

  1. Are two “Completed” events expected behavior for SageMaker Batch Transform jobs?
  2. Is there a possibility that I may have inadvertently registered two TransformJobStatus “listeners” on the same job?
  3. Is there a way to consume the first event (mark as handled) thereby to suppress the second “Completed” transform job event?
  4. Is there a better way to post process the S3 results of the transform job than listening to a CloudWatch event and invoking a lambda?

This is my code in the script called from the AWS Batch pre-processing job

response = sagemaker_client.create_transform_job(
            TransformJobName=transform_name,
            ModelName=model_name,
            TransformInput={
                "CompressionType": "None",
                "ContentType": "image/jpeg",
                "DataSource": {
                    "S3DataSource": {
                        "S3DataType": "S3Prefix",
                        "S3Uri": s3_frame_directory_path
                    }
                },
                "SplitType": "None"
            },
            TransformOutput={
                "AssembleWith": "None",
                "KmsKeyId": "",
                "S3OutputPath": s3_predictions_directory_path
            },
            TransformResources={
                "InstanceCount": 1,
                "InstanceType": "ml.p2.xlarge"
            },
            Tags=[
                {
                    "Key": "tnc_transform",
                    "Value": "yes"
                },
                {
                    "Key": "extra_commands",
                    "Value": extra_commands
                },
                {
                    "Key": "training_job_name",
                    "Value": self.training_job_name
                },
                {
                    "Key": "s3_video_path",
                    "Value": self.s3_video_path
                },
                {
                    "Key": "s3_output_directory",
                    "Value": self.s3_output_directory
                },
                {
                    "Key": "label_base64_json",
                    "Value": self.label_base64_json
                }
            ]

This is my rule for the CloudWatch event pattern that will trigger the lambda function.

{
  "detail-type": [
    "SageMaker Transform Job State Change"
  ],
  "source": [
    "aws.sagemaker"
  ],
  "detail": {
    "TransformJobStatus": [
      "Completed"
    ]
  }
}

I’m a senior developer but new to AWS and SageMaker. I’ve inherited the code from another team and from what I understand it was written correctly and it (mostly) works. I just want to prevent the second post processing job from occurring. I’ve been investigating this issue for a couple of days and I’m not sure how to proceed at this point. Any help would be appreciated.

Thanks!

asked 4 years ago805 views
1 Answer
1

I just want to follow up on this issue since it’s been resolved and others could benefit from my knowledge. I filed a support case with AWS with our Developer account and had a productive dialog with an AWS support engineer which led to a resolution to this issue.

First of all, AWS support verified that two “Completed” events are delivered at the completion of a batch transform job. Support mentioned that most of the fields in the two payloads are identical but he identified that one of the events have an empty (“”) value for the “ModelName” field. The other event has the correct name of the model used for inference. With this information he suggested that I change the event matching pattern to filter out one of the events.

I used the "anything-but" rule matching statement as described in the EventBridge user guide to exclude the ModelName field with an empty value: https://docs.aws.amazon.com/eventbridge/latest/userguide/content-filtering-with-event-patterns.html

The following event pattern will match only the "Completed" SageMaker batch transform event that has a non-empty value for “ModelName”. In other words, it would exclude the "Completed" event that has an empty value for the ModelName field so only one event would trigger the lambda function:

{
  "detail-type": [
    "SageMaker Transform Job State Change"
  ],
  "source": [
    "aws.sagemaker"
  ],
  "detail": {
    "TransformJobStatus": [
      "Completed"
    ],
    "ModelName": [
      {
        "anything-but": ""
      }
    ]
  }
}

As a result of this change, only one post processing job would be invoked - which was the intention.

The support engineer recognized that there wasn't a rational justification for sending two "Completed" events for a SageMaker Batch Transform job. He is engaging the SageMaker team for further investigation.

answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions