confusion about PIPE mode when using S3 shard key

0

Hi,

I am a little confused about whether S3 Shard key would work when using PIPE mode, here is a example:

Assume I have:

2 instance, each instance have 4 worker;

data: total 8 files with total size 8GB, each file is 1GB. Put them into 4 different S3 path, that means, each path has 2 files (2GB in total)

If I use PIPE mode, and s3_input using distribution='ShardedByS3Key', and create 4 channel (each channel mapping a s3 path, 2 files)

train_s3_input_1 = sagemaker.inputs.s3_input(channel_1, distribution='ShardedByS3Key')

Question:

How much data of each worker get to train, 1 file or 2 files? thanks

AWS
질문됨 4년 전237회 조회
1개 답변
0
수락된 답변

Hi, SageMaker will replicate a subset of data (1/n ML compute instances) on each ML compute instance that is launched for model training when you specify ShardedByS3Key. If there are n ML compute instances launched for a training job, each instance gets approximately 1/n of the number of S3 objects. This applies in both File and Pipe modes. Keep this in mind when developing algorithms.

To answer your question: How much data of each worker get to train, 1 file or 2 files? 1 file each from the training channel.

AWS
Will_B
답변함 4년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠