Sagemaker Pipelines API rate limit exceeded

2

I wish to train 48 models in parallel in Sagemaker Pipelines using 48 TrainingSteps. I cannot use a hyperparameter tuning job as the quota limit is only 10 parallel training jobs and this cannot be increased. I have configured the quota to allow up to 48 machines to be used in parallel, and the pipeline compiles and starts successfully. The training jobs all complete successfully when I look at the Sagemaker Training jobs dashboard.

The problem is that the pipeline it self fails. Some of the training steps register as complete, but many of the state they have failed with the error: 'Failed to invoke sagemaker.DescribeTraining.Job. Error Details: Rate exceeded'.

The rate limit is 5/sec for DescribeTraining.Job and this cannot be changed, so it seems when the pipeline is executed, it is hitting this rate limit when updating the status of the pipeline and causing the pipeline to fail.

  • I also get the same error that occurs when I try to run only 3 SM Pipelines at the same time, with just a single TrainingStep in each Pipeline. The training job succeeded, but the SM Pipeline fails. My training job is around 5-6 hours long, so unless this is resolved, I cannot rely on SM Pipeline to train the models, as it is very compute and time expensive to re-run the entire training step.

asked 2 years ago538 views
1 Answer
0

Please refer to the link below related to the Amazon SageMaker endpoints and quotas: https://docs.aws.amazon.com/general/latest/gr/sagemaker.html

Per the link - Maximum number of training jobs each hyper parameter tuning job can run in parallel at once Each supported Region: 10 No Maximum number of training jobs each hyper parameter tuning job can run in parallel at once

As the limit is not adjustable, hence by raising a support case the limit can not be increased.

AWS
answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions