Sagemaker Pipelines API rate limit exceeded

2

I wish to train 48 models in parallel in Sagemaker Pipelines using 48 TrainingSteps. I cannot use a hyperparameter tuning job as the quota limit is only 10 parallel training jobs and this cannot be increased. I have configured the quota to allow up to 48 machines to be used in parallel, and the pipeline compiles and starts successfully. The training jobs all complete successfully when I look at the Sagemaker Training jobs dashboard.

The problem is that the pipeline it self fails. Some of the training steps register as complete, but many of the state they have failed with the error: 'Failed to invoke sagemaker.DescribeTraining.Job. Error Details: Rate exceeded'.

The rate limit is 5/sec for DescribeTraining.Job and this cannot be changed, so it seems when the pipeline is executed, it is hitting this rate limit when updating the status of the pipeline and causing the pipeline to fail.

  • I also get the same error that occurs when I try to run only 3 SM Pipelines at the same time, with just a single TrainingStep in each Pipeline. The training job succeeded, but the SM Pipeline fails. My training job is around 5-6 hours long, so unless this is resolved, I cannot rely on SM Pipeline to train the models, as it is very compute and time expensive to re-run the entire training step.

posta 2 anni fa594 visualizzazioni
1 Risposta
0

Please refer to the link below related to the Amazon SageMaker endpoints and quotas: https://docs.aws.amazon.com/general/latest/gr/sagemaker.html

Per the link - Maximum number of training jobs each hyper parameter tuning job can run in parallel at once Each supported Region: 10 No Maximum number of training jobs each hyper parameter tuning job can run in parallel at once

As the limit is not adjustable, hence by raising a support case the limit can not be increased.

AWS
con risposta un anno fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande