how to copy data and run training in sagemaker?

0

i need to pull data from different sources and run processing then training jobs, I want to set it up such that , I spin up one or more ec2 instance , collect the data , copy it to some s3 , then use sagemaker api to create processing and training jobs. I want to do this via code, provisioning ec2 instances and the code to download data and running training and processing jobs, all of it. one question, i had was, once i download the data, can i copy the data directly to whatever data storage/volume comes attached with the sagemaker training/processing instance. I'm not sure what's it called . i know there are options to stream data? any thoughts. this would be my first training job , so apologies for basic question here

asked a year ago608 views
1 Answer
0

Hello,

Thank you for using AWS SageMaker.

With regards to your question that you would like to know if there is any option to directly copy/download the data to SageMaker training job resource eliminating S3. Unfortunately due to architecture design, it is not possible to directly load/copy data onto underlying instance of SageMaker training job and users need to have training data stored on S3 bucket. To speedup the training process, we recommend using Pipe mode along your algorithm supports[1][2].

Hope my above response helps you.

Reference:

[1] https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-training.html

[2] https://aws.amazon.com/blogs/machine-learning/using-pipe-input-mode-for-amazon-sagemaker-algorithms/

AWS
SUPPORT ENGINEER
answered a year ago
  • @kartik_R - i might be wrong on this, but isn't code copied to /opt/... , inside the container where training runs. not sure about the data, but i was assuming similar thing for data as well, where it is copied locally

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions