Do I have to redownload dataset to training job every time I run a Sagemaker Estimator training job?

0

Hi, Over the coming weeks I'll be running some deep learning experiments using the PyTorch Sagemaker estimator, and I was wondering if it would be possible to avoid re-downloading my dataset every time I call estimator.fit()?

Is there a way to do this without using FastFile mode - ie downloading the dataset once and using the same docker image?

If it's not possible to do it with online instances, would it be possible to re-use the docker instance used if I was to run it in local mode (ie instance_type='local_gpu') - if so, how?

And just to add, I am using S3 for the input data.

Many thanks, Tim

1개 답변
1

Hi Tim, SageMaker training job will need to download/stream in data from S3. Currently by default, the training job's input data config is file mode which means the data will be downloaded from s3. We have launched a new mode called fast file mode which will stream data in while the job runs. If you are aware of the pipe mode, the fast file mode is a combination of file mode and pipe mode, which streams data in to the training instance without any code change. Please refer to the what's new doc https://aws.amazon.com/about-aws/whats-new/2021/10/amazon-sagemaker-fast-file-mode/. To use the fast file mode, you just simply change the configuration of your estimator according (https://github.com/aws/sagemaker-python-sdk/blob/dev/src/sagemaker/estimator.py#L151). Additionally, SageMaker training job does support other data storage source other than S3. You can use EFS or FSx for Lustre to speed up your training by eliminating the need to download data as used in file mode. You can refer to the blog here https://aws.amazon.com/blogs/machine-learning/speed-up-training-on-amazon-sagemaker-using-amazon-efs-or-amazon-fsx-for-lustre-file-systems/

AWS
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠