How to update a pipeline step and ensure if runs even if cached

0

I have a SageMaker pipeline that contains two custom steps. Each uses a customer docker container and runs a Python script. The build script contained a type so instead of deploying two different docker images, it deployed the same image twice, and both steps have an ImageUri {custom_step_image}:latest

I fixed the issue, but because the step inputs haven't changed (the ImageUri of the docker image is the same, but the actual docker image is different) the step won't run because it's cached (at least until the cache expiry).

Sagemaker Studio doesn't appear to have any way of forcing a pipeline to run, is there a way - perhaps executing via a notebook? Also is there a way of ensuring the cache is reset during a deployment - I used this code to create/update the pipeline:

pipeline = Pipeline(
    name=args.pipeline_name,
    parameters=[s3_bucket, model_uri, dataset, model_package_group_name],
    steps=[create_pyenv_step, train_model_step, package_model_step, register_model_step],
    sagemaker_session=session,
)
pipeline.upsert(role_arn=role)

Also in general whilst SageMaker studio is nice, I find it frustrating that:

  • If your AWS session expires you cannot just log back in (it keeps saying token has expired). You have to navigate from the sagamaker landing page back to sagemaker studio. This is inconsistent with pretty much all of the other pages in AWS.
  • You cannot copy links and send them to colleagues, it makes it much harder to debug when you cannot share links
  • Some features can only be accessed from sagamaker studio (i.e. the pipelines and model registry ui's which is inconsistent)
Dave
已提問 6 個月前檢視次數 345 次
1 個回答
1
已接受的答案

Hello,

From the correspondence, I understand that you want to know if there is a way to update a pipeline step and ensure if runs even if cached, also is there a way of ensuring the cache is reset during a deployment.

I would like to inform you that a pipeline step does not rerun if you change any attributes that are not listed in Default cache key attributes by pipeline step type for its step type. However, you may decide that you want the pipeline step to rerun anyway. In this case, you need to turn off step caching.

When you use step signature caching, SageMaker Pipelines tries to find a previous run of your current pipeline step with the same values for certain attributes. If found, SageMaker Pipelines propagates the outputs from the previous run rather than recomputing the step. The attributes checked are specific to the step type, and are listed in Default cache key attributes by pipeline step type.

When deciding whether to reuse a previous pipeline step or rerun the step, SageMaker Pipelines checks to see if certain attributes have changed. If the set of attributes is different from all previous runs within the timeout period, the step runs again. These attributes include input artifacts, app or algorithm specification, and environment variables.

I hope you find the above information helpful.

Thank you!

====Reference==== [+] https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-caching.html#pipelines-default-keys [+] https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-caching.html#pipelines-caching-disabling [+] https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-caching.html#pipelines-default-keys

AWS
已回答 6 個月前
profile picture
專家
已審閱 1 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南