Optimize resource for Sagemaker Pipeline

0

Hi. I have a problem when use Sagemaker pipeline when setup pipeline processing raw data. My knowledge at each step Sagemaker will initial new instance, it spend long time for initial (by processor i have configured). So, if i want all steps on dataset only use 1 instance (may be it can difference images but in same a instance).

I am following by document: https://docs.aws.amazon.com/sagemaker/latest/dg/define-pipeline.html#define-pipeline-prereq

1 Antwort
0

Hi,

Did you envision SageMaker Managed Warm Pools for training to keep your provisioned infra warm and be able to reuse it?

See https://docs.aws.amazon.com/sagemaker/latest/dg/train-warm-pools.html

Another option to explore would be Selective Execution for SageMaker Pipelines: https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-selective-ex.html

Finally, you can also cache Pipelines steps: https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-caching.html

Although not direct answer to your question about instance reuse, those capabilities will help in reducing global pipeline execution time when applicable to your use case.

Hope it helps! Didier

profile pictureAWS
EXPERTE
beantwortet vor 10 Monaten
  • Thank you for answering. But my confuse is: when a processing data with separate steps:

    1. step 1: transform column "price house" column,
    2. step2: transform column "House area" of datasource A and datasource B
    3. step 3: i combine all data in step 1 and step 2 -> store s3. I imagine that step 1,2,3 will run on separate steps, but i use same instance type. So, when start pipeline i only initial a instance and execute 3 job. But, when i monitoring complete pipeline, i feel all steps start new instance. (very latency)!!!

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen