Optimize resource for Sagemaker Pipeline

0

Hi. I have a problem when use Sagemaker pipeline when setup pipeline processing raw data. My knowledge at each step Sagemaker will initial new instance, it spend long time for initial (by processor i have configured). So, if i want all steps on dataset only use 1 instance (may be it can difference images but in same a instance).

I am following by document: https://docs.aws.amazon.com/sagemaker/latest/dg/define-pipeline.html#define-pipeline-prereq

profile picture
已提問 10 個月前檢視次數 353 次
1 個回答
0

Hi,

Did you envision SageMaker Managed Warm Pools for training to keep your provisioned infra warm and be able to reuse it?

See https://docs.aws.amazon.com/sagemaker/latest/dg/train-warm-pools.html

Another option to explore would be Selective Execution for SageMaker Pipelines: https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-selective-ex.html

Finally, you can also cache Pipelines steps: https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-caching.html

Although not direct answer to your question about instance reuse, those capabilities will help in reducing global pipeline execution time when applicable to your use case.

Hope it helps! Didier

profile pictureAWS
專家
已回答 10 個月前
  • Thank you for answering. But my confuse is: when a processing data with separate steps:

    1. step 1: transform column "price house" column,
    2. step2: transform column "House area" of datasource A and datasource B
    3. step 3: i combine all data in step 1 and step 2 -> store s3. I imagine that step 1,2,3 will run on separate steps, but i use same instance type. So, when start pipeline i only initial a instance and execute 3 job. But, when i monitoring complete pipeline, i feel all steps start new instance. (very latency)!!!

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南