How to update a pipeline step and ensure if runs even if cached

0

I have a SageMaker pipeline that contains two custom steps. Each uses a customer docker container and runs a Python script. The build script contained a type so instead of deploying two different docker images, it deployed the same image twice, and both steps have an ImageUri {custom_step_image}:latest

I fixed the issue, but because the step inputs haven't changed (the ImageUri of the docker image is the same, but the actual docker image is different) the step won't run because it's cached (at least until the cache expiry).

Sagemaker Studio doesn't appear to have any way of forcing a pipeline to run, is there a way - perhaps executing via a notebook? Also is there a way of ensuring the cache is reset during a deployment - I used this code to create/update the pipeline:

pipeline = Pipeline(
    name=args.pipeline_name,
    parameters=[s3_bucket, model_uri, dataset, model_package_group_name],
    steps=[create_pyenv_step, train_model_step, package_model_step, register_model_step],
    sagemaker_session=session,
)
pipeline.upsert(role_arn=role)

Also in general whilst SageMaker studio is nice, I find it frustrating that:

  • If your AWS session expires you cannot just log back in (it keeps saying token has expired). You have to navigate from the sagamaker landing page back to sagemaker studio. This is inconsistent with pretty much all of the other pages in AWS.
  • You cannot copy links and send them to colleagues, it makes it much harder to debug when you cannot share links
  • Some features can only be accessed from sagamaker studio (i.e. the pipelines and model registry ui's which is inconsistent)
Dave
asked 5 months ago328 views
1 Answer
1
Accepted Answer

Hello,

From the correspondence, I understand that you want to know if there is a way to update a pipeline step and ensure if runs even if cached, also is there a way of ensuring the cache is reset during a deployment.

I would like to inform you that a pipeline step does not rerun if you change any attributes that are not listed in Default cache key attributes by pipeline step type for its step type. However, you may decide that you want the pipeline step to rerun anyway. In this case, you need to turn off step caching.

When you use step signature caching, SageMaker Pipelines tries to find a previous run of your current pipeline step with the same values for certain attributes. If found, SageMaker Pipelines propagates the outputs from the previous run rather than recomputing the step. The attributes checked are specific to the step type, and are listed in Default cache key attributes by pipeline step type.

When deciding whether to reuse a previous pipeline step or rerun the step, SageMaker Pipelines checks to see if certain attributes have changed. If the set of attributes is different from all previous runs within the timeout period, the step runs again. These attributes include input artifacts, app or algorithm specification, and environment variables.

I hope you find the above information helpful.

Thank you!

====Reference==== [+] https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-caching.html#pipelines-default-keys [+] https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-caching.html#pipelines-caching-disabling [+] https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-caching.html#pipelines-default-keys

AWS
answered 5 months ago
profile picture
EXPERT
reviewed a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions