How do I troubleshoot retry and timeout issues when I use an AWS SDK to invoke a Lambda function?

6 minute read
0

When I invoke my AWS Lambda function with the AWS SDK, the function times out, the API request fails, or an API action is duplicated.

Short description

Retry and timeout issues might occur when invoking a Lambda function with an AWS SDK because of the following conditions:

  • A remote API is unreachable or takes too long to respond to an API call.
  • The API call doesn't get a response within the socket timeout.
  • The API call doesn't get a response within the Lambda function's timeout period.

Note: API calls can take longer than expected when network connection issues occur. Network issues can also cause retries and duplicated API requests. To prepare for these occurrences, make sure that your Lambda function is idempotent.

If you use an AWS SDK to make an API call and the call fails, then the AWS SDK automatically retries the call. How many times the AWS SDK retries and for how long is determined by settings that vary among each AWS SDK.

Default AWS SDK retry settings

Note: Some values may be different for other AWS services.

AWS SDKMaximum retry countConnection timeoutSocket timeout
Python (Boto 3)depends on service60 seconds60 seconds
JavaScript/Node.jsdepends on serviceN/A120 seconds
Java310 seconds50 seconds
.NET4100 seconds300 seconds
Go3N/AN/A

To troubleshoot the retry and timeout issues, first review the logs of the API call to find the problem. Then, change the retry count and timeout settings of the AWS SDK as needed for each use case. To allow enough time for a response to the API call, add time to the Lambda function timeout setting.

Resolution

Log the API calls made by the AWS SDK

Use Amazon CloudWatch Logs to get details about failed connections and the number of attempted retries for each. For more information, see Using CloudWatch Logs logs with Lambda. Or, see the following instructions for the AWS SDK that you used:

Example error log where the API call failed to establish a connection (connection timeout)

START RequestId: b81e56a9-90e0-11e8-bfa8-b9f44c99e76d Version: $LATEST2018-07-26T14:32:27.393Z    b81e56a9-90e0-11e8-bfa8-b9f44c99e76d    [AWS ec2 undefined 40.29s 3 retries] describeInstances({})
2018-07-26T14:32:27.393Z    b81e56a9-90e0-11e8-bfa8-b9f44c99e76d    { TimeoutError: Socket timed out without establishing a connection

...

Example error log where the API call connection was successful, but timed out after the API response took too long (socket timeout)

START RequestId: 3c0523f4-9650-11e8-bd98-0df3c5cf9bd8 Version: $LATEST2018-08-02T12:33:18.958Z    3c0523f4-9650-11e8-bd98-0df3c5cf9bd8    [AWS ec2 undefined 30.596s 3 retries] describeInstances({})2018-08-02T12:33:18.978Z    3c0523f4-9650-11e8-bd98-0df3c5cf9bd8    { TimeoutError: Connection timed out after 30s

Note: These logs aren't generated if the API request doesn't get a response within your Lambda function's timeout. If the API request ends because of a function timeout, try one of the following:

Change the AWS SDK's settings

The retry count and timeout settings of the AWS SDK should allow enough time for your API call to get a response. To determine the right values for each setting, test different configurations and get the following information:

  • Average time to establish a successful connection
  • Average time that a full API request takes until it's successfully returned

For more information on changing retry count and timeout settings, see the following AWS SDK client configuration documentation:

The following are some example commands that change retry count and timeout settings for each runtime.

Note: Be sure to replace the example values for each setting with the values for your use case.

Example Python (Boto 3) command to change retry count and timeout settings

# max_attempts: retry count / read_timeout: socket timeout / connect_timeout: new connection timeout
from botocore.session import Session
from botocore.config import Config

s = Session()
c = s.create_client('s3', config=Config(connect_timeout=5, read_timeout=60, retries={'max_attempts': 2}))

Example JavaScript/Node.js command to change retry count and timeout settings

// maxRetries: retry count / timeout: socket timeout / connectTimeout: new connection timeout
var AWS = require('aws-sdk');

AWS.config.update({

    maxRetries: 2,

    httpOptions: {

        timeout: 30000,

        connectTimeout: 5000

    }

});

Example JavaScript V3 command to change retry count and timeout settings

const { S3Client, ListBucketsCommand } = require("@aws-sdk/client-s3");
const { NodeHttpHandler } = require("@aws-sdk/node-http-handler");
const client = new S3Client({
    requestHandler: new NodeHttpHandler({
        connectionTimeout: 30000,
        socketTimeout: 50000
    }),
    maxAttempts: 2
});

Example Java command to change retry count and timeout settings

// setMaxErrorRetry(): retry count / setSocketTimeout(): socket timeout / setConnectionTimeout(): new connection timeout
ClientConfiguration clientConfig = new ClientConfiguration(); 

clientConfig.setSocketTimeout(60000); 
clientConfig.setConnectionTimeout(5000);
clientConfig.setMaxErrorRetry(2);

AmazonDynamoDBClient ddb = new AmazonDynamoDBClient(credentialsProvider,clientConfig);

Example .NET command to change retry count and timeout settings

// MaxErrorRetry: retry count / ReadWriteTimeout: socket timeout / Timeout: new connection timeout
var client = new AmazonS3Client(

    new AmazonS3Config {
        Timeout = TimeSpan.FromSeconds(5),
        ReadWriteTimeout = TimeSpan.FromSeconds(60),
        MaxErrorRetry = 2
});

Example Go command to change retry count settings

// Create Session with MaxRetry configuration to be shared by multiple service clients.sess := session.Must(session.NewSession(&aws.Config{
    MaxRetries: aws.Int(3),
}))
 
// Create S3 service client with a specific Region.
svc := s3.New(sess, &aws.Config{
    Region: aws.String("us-west-2"),
})

Example Go command to change request timeout settings

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)defer cancel()
// SQS ReceiveMessage
params := &sqs.ReceiveMessageInput{ ... }
req, resp := s.ReceiveMessageRequest(params)
req.HTTPRequest = req.HTTPRequest.WithContext(ctx)
err := req.Send()

(Optional) Change your Lambda function's timeout setting

A low Lambda function timeout can cause healthy connections to drop early. Increase the function timeout setting to allow enough time for your API call to get a response.

Use the following formula to estimate the base time needed for the function timeout:

First attempt (connection timeout + socket timeout) + Number of retries x (connection timeout + socket timeout) + 20 seconds additional code runtime margin = Required Lambda function timeout

Example Lambda function timeout calculation

Note: The following calculation is for an AWS SDK that's configured for three retries, a 10-second connection timeout, and a 30-second socket timeout.

First attempt (10 seconds + 30 seconds) + Number of retries [3 * (10 seconds + 30 seconds)] + 20 seconds additional code runtime margin = 180 seconds

Related information

Invoke

Understanding retry behavior in Lambda

Lambda quotas

AWS OFFICIAL
AWS OFFICIALUpdated 5 months ago