- Newest
- Most votes
- Most comments
Hi Tim, SageMaker training job will need to download/stream in data from S3. Currently by default, the training job's input data config is file mode which means the data will be downloaded from s3. We have launched a new mode called fast file mode which will stream data in while the job runs. If you are aware of the pipe mode, the fast file mode is a combination of file mode and pipe mode, which streams data in to the training instance without any code change. Please refer to the what's new doc https://aws.amazon.com/about-aws/whats-new/2021/10/amazon-sagemaker-fast-file-mode/. To use the fast file mode, you just simply change the configuration of your estimator according (https://github.com/aws/sagemaker-python-sdk/blob/dev/src/sagemaker/estimator.py#L151). Additionally, SageMaker training job does support other data storage source other than S3. You can use EFS or FSx for Lustre to speed up your training by eliminating the need to download data as used in file mode. You can refer to the blog here https://aws.amazon.com/blogs/machine-learning/speed-up-training-on-amazon-sagemaker-using-amazon-efs-or-amazon-fsx-for-lustre-file-systems/
Relevant content
- asked 9 months ago
- asked a year ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated 6 months ago
- AWS OFFICIALUpdated 6 months ago
- AWS OFFICIALUpdated 6 months ago