Do I have to redownload dataset to training job every time I run a Sagemaker Estimator training job?

0

Hi, Over the coming weeks I'll be running some deep learning experiments using the PyTorch Sagemaker estimator, and I was wondering if it would be possible to avoid re-downloading my dataset every time I call estimator.fit()?

Is there a way to do this without using FastFile mode - ie downloading the dataset once and using the same docker image?

If it's not possible to do it with online instances, would it be possible to re-use the docker instance used if I was to run it in local mode (ie instance_type='local_gpu') - if so, how?

And just to add, I am using S3 for the input data.

Many thanks, Tim

已提問 2 年前檢視次數 743 次
1 個回答
1

Hi Tim, SageMaker training job will need to download/stream in data from S3. Currently by default, the training job's input data config is file mode which means the data will be downloaded from s3. We have launched a new mode called fast file mode which will stream data in while the job runs. If you are aware of the pipe mode, the fast file mode is a combination of file mode and pipe mode, which streams data in to the training instance without any code change. Please refer to the what's new doc https://aws.amazon.com/about-aws/whats-new/2021/10/amazon-sagemaker-fast-file-mode/. To use the fast file mode, you just simply change the configuration of your estimator according (https://github.com/aws/sagemaker-python-sdk/blob/dev/src/sagemaker/estimator.py#L151). Additionally, SageMaker training job does support other data storage source other than S3. You can use EFS or FSx for Lustre to speed up your training by eliminating the need to download data as used in file mode. You can refer to the blog here https://aws.amazon.com/blogs/machine-learning/speed-up-training-on-amazon-sagemaker-using-amazon-efs-or-amazon-fsx-for-lustre-file-systems/

AWS
已回答 2 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南