How to update a pipeline step and ensure if runs even if cached

0

I have a SageMaker pipeline that contains two custom steps. Each uses a customer docker container and runs a Python script. The build script contained a type so instead of deploying two different docker images, it deployed the same image twice, and both steps have an ImageUri {custom_step_image}:latest

I fixed the issue, but because the step inputs haven't changed (the ImageUri of the docker image is the same, but the actual docker image is different) the step won't run because it's cached (at least until the cache expiry).

Sagemaker Studio doesn't appear to have any way of forcing a pipeline to run, is there a way - perhaps executing via a notebook? Also is there a way of ensuring the cache is reset during a deployment - I used this code to create/update the pipeline:

pipeline = Pipeline(
    name=args.pipeline_name,
    parameters=[s3_bucket, model_uri, dataset, model_package_group_name],
    steps=[create_pyenv_step, train_model_step, package_model_step, register_model_step],
    sagemaker_session=session,
)
pipeline.upsert(role_arn=role)

Also in general whilst SageMaker studio is nice, I find it frustrating that:

  • If your AWS session expires you cannot just log back in (it keeps saying token has expired). You have to navigate from the sagamaker landing page back to sagemaker studio. This is inconsistent with pretty much all of the other pages in AWS.
  • You cannot copy links and send them to colleagues, it makes it much harder to debug when you cannot share links
  • Some features can only be accessed from sagamaker studio (i.e. the pipelines and model registry ui's which is inconsistent)
Dave
demandé il y a 6 mois346 vues
1 réponse
1
Réponse acceptée

Hello,

From the correspondence, I understand that you want to know if there is a way to update a pipeline step and ensure if runs even if cached, also is there a way of ensuring the cache is reset during a deployment.

I would like to inform you that a pipeline step does not rerun if you change any attributes that are not listed in Default cache key attributes by pipeline step type for its step type. However, you may decide that you want the pipeline step to rerun anyway. In this case, you need to turn off step caching.

When you use step signature caching, SageMaker Pipelines tries to find a previous run of your current pipeline step with the same values for certain attributes. If found, SageMaker Pipelines propagates the outputs from the previous run rather than recomputing the step. The attributes checked are specific to the step type, and are listed in Default cache key attributes by pipeline step type.

When deciding whether to reuse a previous pipeline step or rerun the step, SageMaker Pipelines checks to see if certain attributes have changed. If the set of attributes is different from all previous runs within the timeout period, the step runs again. These attributes include input artifacts, app or algorithm specification, and environment variables.

I hope you find the above information helpful.

Thank you!

====Reference==== [+] https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-caching.html#pipelines-default-keys [+] https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-caching.html#pipelines-caching-disabling [+] https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-caching.html#pipelines-default-keys

AWS
répondu il y a 6 mois
profile picture
EXPERT
vérifié il y a un mois

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions