跳至內容

How do I troubleshoot auto scaling issues for SageMaker AI endpoints?

3 分的閱讀內容
0

I want to troubleshoot auto scaling issues for Amazon SageMaker AI endpoints.

Resolution

Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.

Troubleshoot auto scaling for SageMaker AI endpoints based on the issue that you're experiencing.

Resource limit exceeded errors

If you reach a service quota when you scale your SageMaker AI endpoints, then you receive an error message that's similar to the following:

"Failed to set desired instance count to 2. Reason: The account-level service limit 'ml.p2.xlarge for endpoint usage' is 1 Instance, with current utilization of 0 Instances and a request delta of 2 Instances. Please use AWS Service Quotas to request an increase for this quota."

To resolve this issue, view the service quota for your instance type. If you reached your quota, then request a service quota increase.

Scaling takes longer than expected

If your scaling out process is long with a low cooldown period, then your Amazon CloudWatch alarms might be aggregating multiple data points before you activate scaling. To resolve this issue, reduce the Datapoints to alarm Amazon CloudWatch alarm setting.

Also, other scaling policies or service quotas might cause your scaling process to take longer than expected. So, check your configurations and service quotas to identify issues.

Your auto scaling policy doesn't scale down instances as expected

If your auto scaling policy doesn't scale down instances as expected and traffic is low, then take the following actions:

  • Configure the correct metric for your policy. For asynchronous endpoints, use the ApproximateBacklogSizePerInstance metric. For real-time endpoints, use the InvocationsPerInstance metric.
  • For more responsive auto scaling, adjust your scaling thresholds, cooldown periods, and other workload-related parameters.
  • Check whether your policy scales based on the CPU utilization metric. A policy that scales based on the CPU utilization metric might not scale down when traffic decreases.
  • If you set a warmup time in your scaling policy, then align the policy with how quickly your instances manage traffic changes. Instances that are warming up don't count towards the aggregated metrics for scaling.

Auto scaling isn't activated in certain conditions

The following conditions cause auto scaling not to activate:

  • The instance type isn't available in the selected Availability Zone.
  • There's insufficient capacity in the selected instance type.
  • You didn't correctly configure the scaling policy.

To resolve your auto scaling failures, take the following actions:

  • Check the instance type availability in the Availability Zone that you're using.
  • Increase the scaling threshold to activate scaling out at an earlier time.
  • Use a different instance type that has more available capacity.
  • Configure your CloudWatch alarms to monitor your metrics. Also, make sure that the alarms transition to the ALARM state when the conditions for scaling down are met. To monitor your scaling activities, run the following application-autoscaling command:
    aws application-autoscaling describe-scaling-activities --service-namespace sagemaker --resource-id example-resource-id --include-not-scaled-activities
    Note: Replace example-resource-id with your resource ID.

Related information

What is Service Quotas?

Automatic scaling of Amazon SageMaker AI models

Metrics for monitoring Amazon SageMaker AI with Amazon CloudWatch

Asynchronous inference

AWS 官方已更新 1 年前