How to automate sagemaker batch transform?

0

does cloudformation support sagemaker batch transform? if yes, can the jobs be triggered/run automatically once the stack is created?

1 Answer
1
Accepted Answer

While CloudFormation doesn't currently offer a resource for a SageMaker Batch Transform (resource list here in the docs), there are plenty of other integration points to automate running these jobs.

CloudFormation

I'd actually argue that CloudFormation is probably not a great fit for this anyway because CloudFormation defines resources which can be created, updated, and deleted. I could maybe see a correspondence between "Create" = "Run a job", maybe "Delete" = "Delete job outputs", and possibly "Update" = "Re-run the job"? But these are opinionated choices that might not make sense in every case.

If you really wanted, you could create a Custom CloudFormation resource backed by an AWS Lambda function using the CreateTransformJob API (via whatever language you prefer e.g. boto3 in Python).

Note that:

  • If you wanted to use the SageMaker Python SDK (import sagemaker, Transformer, etc) instead of the low-level boto3 interface in Python - you'd need to install this extra library in your Lambda function. Tools like AWS SAM and CDK can help with this.
  • The maximum Lambda timeout is 15 minutes, you may not want to keep your Lambda function running (billable) just waiting for the transform to complete anyway, and even the overall Custom Resource will have a longer max timeout within which it must stabilize after a create/update/delete request... So additional orchestration may be required beyond a single synchronous Lambda function call.

Other (better?) options

As mentioned above, you can create, describe and stop SageMaker Batch Transform jobs from any environment where you're able to call AWS APIs / use AWS SDKs... And you can even use the high-level open-source sagemaker SDK from anywhere you install it. Interesting options might include:

  • Amazon SageMaker Pipelines: SageMaker Pipelines have native "steps" for a range of SageMaker processes, including transform jobs but also training, pre-processing and more. You can define a multi-step pipeline from the SageMaker Python SDK (in your notebook or elsewhere) and then start it running on-demand (with parameters) by calling the StartPipelineExecution API.
  • AWS Step Functions: Step Functions provides general-purpose serverless orchestration so while the orchestration for SageMaker jobs in particular might be a little more complex (one step to start the job, then a polling check to check wait for completion) - the visual workflow editor and range of integrations to other services may be useful.
  • Amazon S3 Lambda integrations can trigger an event automatically (to start your transform job) when new data is uploaded to Amazon S3.
  • Scheduled EventBridge Rules can run actions on a regular schedule (such as calling Lambda functions, kicking off these pipelines, etc) - in case you need a schedule-based execution rather than in response to some event.

The choice will depend on what the initial trigger for your workflow would be (schedule? Data upload? Some other AWS event? An API call from outside AWS?) and what other steps need to be orchestrated as well as your transform job in the overall flow.

AWS
EXPERT
Alex_T
answered 2 years ago
  • @Alex_T - thanks. I agree cloudformation might not be a good fit. I was trying to see if there was any other way to create the jobs itself, define bunch of parameters and push/build stack via cloudformation or something similar and then trigger the jobs with ways like you suggested. but sounds like the creating jobs part is also possible only via sdk

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions