how to copy data and run training in sagemaker?

0

i need to pull data from different sources and run processing then training jobs, I want to set it up such that , I spin up one or more ec2 instance , collect the data , copy it to some s3 , then use sagemaker api to create processing and training jobs. I want to do this via code, provisioning ec2 instances and the code to download data and running training and processing jobs, all of it. one question, i had was, once i download the data, can i copy the data directly to whatever data storage/volume comes attached with the sagemaker training/processing instance. I'm not sure what's it called . i know there are options to stream data? any thoughts. this would be my first training job , so apologies for basic question here

質問済み 1年前629ビュー
1回答
0

Hello,

Thank you for using AWS SageMaker.

With regards to your question that you would like to know if there is any option to directly copy/download the data to SageMaker training job resource eliminating S3. Unfortunately due to architecture design, it is not possible to directly load/copy data onto underlying instance of SageMaker training job and users need to have training data stored on S3 bucket. To speedup the training process, we recommend using Pipe mode along your algorithm supports[1][2].

Hope my above response helps you.

Reference:

[1] https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-training.html

[2] https://aws.amazon.com/blogs/machine-learning/using-pipe-input-mode-for-amazon-sagemaker-algorithms/

AWS
サポートエンジニア
回答済み 1年前
  • @kartik_R - i might be wrong on this, but isn't code copied to /opt/... , inside the container where training runs. not sure about the data, but i was assuming similar thing for data as well, where it is copied locally

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ