how to copy data and run training in sagemaker?

0

i need to pull data from different sources and run processing then training jobs, I want to set it up such that , I spin up one or more ec2 instance , collect the data , copy it to some s3 , then use sagemaker api to create processing and training jobs. I want to do this via code, provisioning ec2 instances and the code to download data and running training and processing jobs, all of it. one question, i had was, once i download the data, can i copy the data directly to whatever data storage/volume comes attached with the sagemaker training/processing instance. I'm not sure what's it called . i know there are options to stream data? any thoughts. this would be my first training job , so apologies for basic question here

已提问 1 年前629 查看次数
1 回答
0

Hello,

Thank you for using AWS SageMaker.

With regards to your question that you would like to know if there is any option to directly copy/download the data to SageMaker training job resource eliminating S3. Unfortunately due to architecture design, it is not possible to directly load/copy data onto underlying instance of SageMaker training job and users need to have training data stored on S3 bucket. To speedup the training process, we recommend using Pipe mode along your algorithm supports[1][2].

Hope my above response helps you.

Reference:

[1] https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-training.html

[2] https://aws.amazon.com/blogs/machine-learning/using-pipe-input-mode-for-amazon-sagemaker-algorithms/

AWS
支持工程师
已回答 1 年前
  • @kartik_R - i might be wrong on this, but isn't code copied to /opt/... , inside the container where training runs. not sure about the data, but i was assuming similar thing for data as well, where it is copied locally

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则