Sagemaker pipeline with CLI - how to refer to Python file in a pipeline step

0

I'm a complete noob with Sagemaker, coming here with AzureML experience. I was very comfortable and liked building ML pipelines with the CLI in AzureML. I've found that Sagemaker has a similar pipeline creation feature with the CLI.
https://awscli.amazonaws.com/v2/documentation/api/latest/reference/sagemaker/create-pipeline.html

However, I haven't been able to find tutorials with the CLI functionality, most content about SageMaker pipelines seems to be about the pipeline Python SDK. My questions are:

  1. Is the CLI approach fully-developed functionality? Are there drawbacks to using it compared to Python SDK or another approach? My goal is to have a streamlined, succinct workflow that I can launch ideally through the command line.
  2. In AzureML, I was able to specify a Python file for each pipeline step to execute. How would I specify such a file in the JSON pipeline definition file that the create-pipeline command refers to. Perplexity told me that I'd have to create a Docker image for each separate file, not sure if true. But if so, that seems pretty cumbersome.
  3. How would I create the environment for each step?
asked 3 months ago126 views
1 Answer
0

To answer your questions:

The CLI approach to defining SageMaker pipelines is a fully supported functionality. Some potential drawbacks compared to the Python SDK are that it may require more manual steps and does not provide programmatic access.

To specify a Python file for each pipeline step, you define the "Container" property for each step in the JSON pipeline definition file. The file path would be specified as part of the container image configuration.

For example:

"Container": { "Image": "myimage:latest", "Command": [ "python_file.py" ] } To create the environment for each step, you define the "Environment" property. This specifies things like the Docker image to use which contains the necessary dependencies and configuration.

profile picture
EXPERT
answered 3 months ago
  • Thanks for the useful information. Regarding the container, does this mean that I have to manually create a Docker image and include whatever libraries my pipeline step requires in addition to the Python file? I was hoping that there's a way to abstract / simplify that step.

    Can you please give an example or two of how using the CLI would require more manual steps compared to the Python SDK? One thing I find with using Python for MLOps commands is that it's verbose and also doesn't allow me to easily differentiate between the source Python code and the cloud (MLOps) code.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions