I want to resolve throttling errors when I use Amazon SageMaker Python SDK.
Short description
API calls to any AWS service can't exceed the maximum allowed API request rate per account and per AWS Region. These API calls might be from an application, AWS Command Line Interface (AWS CLI), or AWS Management Console. If the API requests exceed the maximum rate, then you receive the "Rate Exceeded" error, and the API calls are throttled.
You receive an error message, such as "botocore.exceptions.ClientError: An error occurred (ThrottlingException)." You get this error when you request the SageMaker APIs because of the default retry configuration in Boto3. Override this configuration to increase the number of retry attempts and timeouts it takes to connect and read a response.
Resolution
To resolve this error, add a SageMaker Boto3 client with a custom retry configuration to the SageMaker Python SDK client.
-
Create a SageMaker Boto3 client with a custom retry configuration.
import boto3 from botocore.config
import Config
sm_boto = boto3.client('sagemaker', config=Config(connect_timeout=5, read_timeout=60, retries={'max_attempts': 20}))
print(sm_boto.meta.config.retries)
-
Use the Boto3 client from the previous step to create a SageMaker Python SDK client.
import sagemaker
sagemaker_session = sagemaker.Session(sagemaker_client = sm_boto)
region = sagemaker_session.boto_session.region_name
print(sagemaker_session.sagemaker_client.meta.config.retries)
-
Test a SageMaker API with multiple requests from the SageMaker Python SDK.
import multiprocessing
def worker(TrainingJobName):
print(sagemaker_session.sagemaker_client
.describe_training_job(TrainingJobName=TrainingJobName)
['TrainingJobName'])
return
if __name__ == '__main__':
jobs = []
TrainingJobName = 'your-job-name'
for i in range(10):
p = multiprocessing.Process(target=worker, args=(TrainingJobName,))
jobs.append(p)
p.start()
-
Create an instance of the sagemaker.estimator.Estimator class with the sagemaker_session parameter.
image_uri = '##########'
s3path = '#########'
estimator = sagemaker.estimator.Estimator(image_uri,
sagemaker.get_execution_role(),
instance_count=1,
instance_type='ml.c4.4xlarge',
volume_size = 30,
max_run = 360000,
input_mode= 'File',
output_path=s3path,
sagemaker_session=sagemaker_session)
-
To confirm that the retry configuration resolves the throttling exceptions, launch a training job from the estimator that you created in the previous step.
estimator.fit()
Related information
Boto3 documentation