I want to troubleshoot issues that occur when I invoke or create an Amazon SageMaker AI asynchronous endpoint.
Short description
To detect errors that occur when you invoke an asynchronous endpoint, review the endpoint's Amazon CloudWatch Logs. Review the CloudWatch logs under the log group name /aws/sagemaker/Endpoints/example-endpoint-name and log stream name example-production-variant-name/example-instance-id/data-log. For more information, see the Common endpoint metrics section and Asynchronous inference endpoint metrics section of Monitoring with CloudWatch.
Resolution
Troubleshoot issues that occur when you invoke or create a SageMaker AI asynchronous endpoint based on the following errors:
"ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (413) from primary and could not load the entire response body"
If this error occurs when the payload size is less than 1 GB, then review the CloudWatch Logs of the asynchronous endpoint. Also, locally test the model with different sizes of images and confirm if the error occurs with payload sizes less than 1 GB.
To locally debug the model, perform more loggings (print statements), and then check the portion of the code that causes the error. If the model runs without errors locally, then host the model on SageMaker AI. For more information, see amazon-sagemaker-local-mode on the GitHub website.
Example code:
import logging
import syslogger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler(sys.stdout))
logger.info("Loading file.")print("Loading file. --> from print statement")
Worker timeout
This error occurs when there aren't enough workers inside the model container to process the ping and invocation requests. To resolve this error, increase the number of workers and the model_server_timeout value.
ApproximateAgeOfOldestRequest frequently increases until the queue is cleared
This issue occurs when queue requests aren't cleared efficiently. SageMaker AI asynchronous endpoints use a first in first out (FIFO) approach. However, model inference time, resource contention, or network latencies can interfere with the FIFO approach.
If requests timeout, increase the InvocationTimeoutSeconds parameter value. This parameter specifies the amount of time SageMaker AI waits for the inference to be made before an error is returned. The maximum value that you can set is 3600 seconds (1 hour).
Also, it's a best practice to add an auto scaling policy that monitors ApproximateBacklogSizePerInstance. This allows your endpoint to scale up based on the backlog size and for requests to be processed faster.
Your backlog size is high and the number of instances doesn't increase
To troubleshoot this issue, use the describe-scalable-target, describe-scaling-policies and describe-scaling-activities AWS CLI commands. Also, check if the endpoint is in the InService state.
Related information
Troubleshooting
How to get logs or print statements from SageMaker PyTorch deployed endpoint?
Configuring autoscaling inference endpoints in Amazon SageMaker
Define a scaling policy
amazon-sagemaker-examples from the GitHub website