Optimize resource for Sagemaker Pipeline

0

Hi. I have a problem when use Sagemaker pipeline when setup pipeline processing raw data. My knowledge at each step Sagemaker will initial new instance, it spend long time for initial (by processor i have configured). So, if i want all steps on dataset only use 1 instance (may be it can difference images but in same a instance).

I am following by document: https://docs.aws.amazon.com/sagemaker/latest/dg/define-pipeline.html#define-pipeline-prereq

profile picture
asked 10 months ago337 views
1 Answer
0

Hi,

Did you envision SageMaker Managed Warm Pools for training to keep your provisioned infra warm and be able to reuse it?

See https://docs.aws.amazon.com/sagemaker/latest/dg/train-warm-pools.html

Another option to explore would be Selective Execution for SageMaker Pipelines: https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-selective-ex.html

Finally, you can also cache Pipelines steps: https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-caching.html

Although not direct answer to your question about instance reuse, those capabilities will help in reducing global pipeline execution time when applicable to your use case.

Hope it helps! Didier

profile pictureAWS
EXPERT
answered 10 months ago
  • Thank you for answering. But my confuse is: when a processing data with separate steps:

    1. step 1: transform column "price house" column,
    2. step2: transform column "House area" of datasource A and datasource B
    3. step 3: i combine all data in step 1 and step 2 -> store s3. I imagine that step 1,2,3 will run on separate steps, but i use same instance type. So, when start pipeline i only initial a instance and execute 3 job. But, when i monitoring complete pipeline, i feel all steps start new instance. (very latency)!!!

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions