I want to troubleshoot auto scaling issues for Amazon SageMaker AI endpoints.
Resolution
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.
Troubleshoot auto scaling for SageMaker AI endpoints based on the issue that you're experiencing.
Resource limit exceeded errors
If you reach a service quota when you scale your SageMaker AI endpoints, then you receive an error message that's similar to the following:
"Failed to set desired instance count to 2. Reason: The account-level service limit 'ml.p2.xlarge for endpoint usage' is 1 Instance, with current utilization of 0 Instances and a request delta of 2 Instances. Please use AWS Service Quotas to request an increase for this quota."
To resolve this issue, view the service quota for your instance type. If you reached your quota, then request a service quota increase.
Scaling takes longer than expected
If your scaling out process is long with a low cooldown period, then your Amazon CloudWatch alarms might be aggregating multiple data points before you activate scaling. To resolve this issue, reduce the Datapoints to alarm Amazon CloudWatch alarm setting.
Also, other scaling policies or service quotas might cause your scaling process to take longer than expected. So, check your configurations and service quotas to identify issues.
Your auto scaling policy doesn't scale down instances as expected
If your auto scaling policy doesn't scale down instances as expected and traffic is low, then take the following actions:
- Configure the correct metric for your policy. For asynchronous endpoints, use the ApproximateBacklogSizePerInstance metric. For real-time endpoints, use the InvocationsPerInstance metric.
- For more responsive auto scaling, adjust your scaling thresholds, cooldown periods, and other workload-related parameters.
- Check whether your policy scales based on the CPU utilization metric. A policy that scales based on the CPU utilization metric might not scale down when traffic decreases.
- If you set a warmup time in your scaling policy, then align the policy with how quickly your instances manage traffic changes. Instances that are warming up don't count towards the aggregated metrics for scaling.
Auto scaling isn't activated in certain conditions
The following conditions cause auto scaling not to activate:
- The instance type isn't available in the selected Availability Zone.
- There's insufficient capacity in the selected instance type.
- You didn't correctly configure the scaling policy.
To resolve your auto scaling failures, take the following actions:
Related information
What is Service Quotas?
Automatic scaling of Amazon SageMaker AI models
Metrics for monitoring Amazon SageMaker AI with Amazon CloudWatch
Asynchronous inference