1 Answer
- Newest
- Most votes
- Most comments
0
Hi, SageMaker will replicate a subset of data (1/n ML compute instances) on each ML compute instance that is launched for model training when you specify ShardedByS3Key. If there are n ML compute instances launched for a training job, each instance gets approximately 1/n of the number of S3 objects. This applies in both File and Pipe modes. Keep this in mind when developing algorithms.
To answer your question: How much data of each worker get to train, 1 file or 2 files? 1 file each from the training channel.
answered 4 years ago
Relevant content
- asked 2 years ago
- asked 5 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago