Is it possible to create Parallel Pipelines in Sagemaker

0

I want to bind processing pipeline to multiple training pipeline. I just want to compare algorithm accuracy. Same dataset will be trained by multiple algorithms and will be predicted by them. My goal for the future is consolidate predict results of different algorithms and generate combined/consolidated resulst. Is is possible to do in SageMaker.

Example Schema:

            - Train_Algo1      
 Process    - Train_Algo2    - Predict Result
            - Train_AlgoN
질문됨 2년 전1754회 조회
1개 답변
0

I'd recommend checking out SageMaker Pipelines for this - especially if you're able to use SageMaker Studio for the graphical pipeline management UIs.

You can build your pipeline definition through the SageMaker Python SDK, just like you might normally define Training and Processing jobs. In fact pipeline steps (like TrainingStep) typically just wrap around the standalone constructs (like Estimator) that you might be using already.

Pipeline steps are executed in paralllel by default, unless there is an implicit (properties data) or explicit (depends_on) dependency between them.

SM Pipelines can take parameters, so you could expose necessary training hyperparameters or pre-processing parameters up to the pipeline level, and use the pipeline to kick off multiple end-to-end runs with different configurations.

By turning on step caching, you could prevent your pre-processing from being re-run if the input parameters are unchanged (however, note that caching doesn't look at ongoing executions: So better to trigger one pipeline execution first and wait a bit for the processing step to complete, rather than triggering ~20 all at once so none of them see a cached processing result and all re-run the job).

...And Pipelines automatically tag SageMaker Experiments config (Pipeline = Experiment; Execution = Trial; Step = Trial Component) which you can then use to plot and compare multiple training jobs in the SM Studio UI. So for example your pipeline might just be Pre-process > Train > Evaluate > RegisterModel. If you right click your pipeline's "Experiment" in SMStudio Experiments and Trials view, you can open a list of the executed training jobs and select multiple to scatter-plot the final loss/accuracy vs the hyperparameters.

If you run your evaluation as a SageMaker Processing Job which outputs a JSON in model quality metrics format, you can even have your pipeline load the model into SM Model Registry tagged with this data. This way, you'd be able to see and compare the metrics between model versions (and even charts e.g. ROC curves in classification case) through the SMStudio Model Registry UI.

Some relevant code samples:

AWS
전문가
Alex_T
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠