Load balancing is not happening on sagemaker batch transform job

0

Hi All,

Greetings!!

We have two issues in sagemaker batch transform job.

  1. Load balancing is not happening with two instances even after CPU utilization = 200% and GPU utilization = 81% of single instance and 2nd instance was complete idle.

    Transform job arguments:

    MaxConcurrentTransforms: 2, MaxPayloadInMB: 1, BatchStrategy: MultiRecord, InstanceCount=2

  2. Batch transform job is failing after 20 minutes without any errors in cloud watch logs but noticed that CPU utilization = 200% and GPU utilization = 81% of single instance and 2nd instance was complete idle.

Could you please have a look?

Thanks, Vinayak

  • Is your input data in a single file or split into two?

1 Answer
0

"If you have one input file but initialize multiple compute instances, only one instance processes the input file and the rest of the instances are idle." Kindly see this link for more information. I would suggest confirming you have more than one input file.

Marc
answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions