Optimize resource for Sagemaker Pipeline

0

Hi. I have a problem when use Sagemaker pipeline when setup pipeline processing raw data. My knowledge at each step Sagemaker will initial new instance, it spend long time for initial (by processor i have configured). So, if i want all steps on dataset only use 1 instance (may be it can difference images but in same a instance).

I am following by document: https://docs.aws.amazon.com/sagemaker/latest/dg/define-pipeline.html#define-pipeline-prereq

profile picture
質問済み 10ヶ月前353ビュー
1回答
0

Hi,

Did you envision SageMaker Managed Warm Pools for training to keep your provisioned infra warm and be able to reuse it?

See https://docs.aws.amazon.com/sagemaker/latest/dg/train-warm-pools.html

Another option to explore would be Selective Execution for SageMaker Pipelines: https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-selective-ex.html

Finally, you can also cache Pipelines steps: https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-caching.html

Although not direct answer to your question about instance reuse, those capabilities will help in reducing global pipeline execution time when applicable to your use case.

Hope it helps! Didier

profile pictureAWS
エキスパート
回答済み 10ヶ月前
  • Thank you for answering. But my confuse is: when a processing data with separate steps:

    1. step 1: transform column "price house" column,
    2. step2: transform column "House area" of datasource A and datasource B
    3. step 3: i combine all data in step 1 and step 2 -> store s3. I imagine that step 1,2,3 will run on separate steps, but i use same instance type. So, when start pipeline i only initial a instance and execute 3 job. But, when i monitoring complete pipeline, i feel all steps start new instance. (very latency)!!!

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ