Incorporate SageMaker Data Wrangler into SageMaker Pipelines


Hi all, is it possible to incorporate SageMaker Data Wrangler as a step in SageMaker Pipelines? So that every time the SageMaker Pipelines gets triggered, it starts with sagemaker data wrangler job first before triggering sagemaker training process next?

2 Answers

Yes, it is possible to incorporate SageMaker Data Wrangler as a step in SageMaker Pipelines.

  • SageMaker Pipelines allows you to define and execute a sequence of ML workflow steps such as data preprocessing, model training, evaluation etc.
  • SageMaker Data Wrangler can be used to prepare and transform data. It provides notebooks, flows and processing jobs capabilities.
  • To add a Data Wrangler step in a Pipeline, you can define a SageMaker Processing job that runs a Data Wrangler notebook or flow.
  • This processing job can be configured as the first step in the Pipeline. It will run the data wrangling tasks to preprocess the data.
  • The output of the Data Wrangler processing job can then be used as input to the subsequent training step in the Pipeline.
  • This allows automated execution of end-to-end ML workflows with data preparation via Data Wrangler followed by model training/evaluation via Pipelines.
profile picture
answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions