How do I troubleshoot custom resource failures in AWS CloudFormation?

5 minute read
1

I want to resolve custom resources errors in AWS CloudFormation.

Short description

Custom resource failures fall into these two categories:

  • Operation failed because the AWS Lambda function encountered an error: This failure occurs when the custom resource sends a FAILED signal back to CloudFormation. The failure typically indicates that the AWS Lambda function, which backs the custom resource, encountered an error when it ran.
  • Timeout failure: This occurs when CloudFormation doesn't receive any response from the custom resource within the expected timeframe, which leads to a timeout.

Resolution

To troubleshoot the errors, use the following steps:

Check the Amazon CloudWatch logs

As a first step, for both types of failure, check the Amazon CloudWatch logs for AWS Lambda:

  1. In the CloudFormation console, choose the failure stack. Then, select the Resources tab to find the physical ID of the Lambda function that backs the custom resource.
  2. Choose the Lambda function that you want to open in a new window.
  3. Choose the Monitor tab. Then, choose the View CloudWatch logs button. This shows the Lambda function logs for troubleshooting the errors.

The Lambda function might have been deleted during the CloudFormation rollback. However, the log group might still retain the CloudWatch logs. To find the logs, follow these steps:

  1. Go to the CloudWatch console.

  2. On the left menu, select Log groups.

  3. In the Search box, enter:

    /aws/lambda/<LambdaPhysicalName>

    Note: The LambdaPhysicalName can be found under CloudFormation resources.

    If the CloudWatch logs can't be found, redeploy the stack with the rollback feature turned off. This allows you to investigate the Lambda function's behavior and potential issues.

Explore each failure type and its potential causes

Handle the FAILED signal

You might see this error: "Received response status FAILED from custom resource. Message returned: <reason here>."

This message suggests that the Lambda function that's backing the custom resource encountered an error and has exception handling logic in place.

Follow these options to fix the error:

  • Review the error message, if the response includes a reason. These reasons are usually descriptive and appear directly in the error message of the CloudFormation event.
  • Review the CloudWatch logs for Lambda. Sometimes the error message is unclear or is missing the reason for the error.

CloudFormation doesn't receive a response

The stack fails because of a timeout, because CloudFormation doesn't receive any response from the custom resource. There are several causes that might cause this issue. Review these options to identify the cause of the failure:

Make sure to use the cfn-response module correctly: You can use the cfn-response module in your custom resource's Lambda function to send a signal back to the CloudFormation stack. If the module isn't used in the code correctly, then CloudFormation doesn't get the needed response.

Check the CloudWatch logs: Review the CloudWatch logs to confirm whether there are errors during code execution. These errors might prevent the function from sending the signal to CloudFormation, especially if the code lacks exception handling logic.

Check the Lambda execution timeout: Make sure that your Lambda function's timeout setting is long enough to complete its task. Remember that the maximum limit of a Lambda function is 15 minutes.

Check the Amazon Simple Storage Service (Amazon S3) endpoint access: For CloudFormation to receive a signal, custom resources must access a presigned Amazon Simple Storage Service (Amazon S3) URL. If your Lambda function is within a Virtual Private Cloud, make sure that it's in a subnet. The subnet must allow outbound traffic through a NAT gateway and have proper routing for Amazon S3 endpoint access.

Consider Lambda concurrency issues: If your Lambda logs show a signal that was sent after a timeout, then investigate Lambda concurrency as a potential cause. A timeout occurs when a high number of Lambda functions run simultaneously in the same Region. Monitor Lambda metrics for concurrent executions. To reduce timeouts, use the reserved concurrency for your function.

Manually remove the stack from IN_PROGRESS status: In this error, CloudFormation stays *_IN_PROGRESS until the custom resource reaches its timeout. It might take a while before the resource is marked as FAIL. To stabilize the custom resource quickly, use cURL to make a direct HTTP request. This action sometimes bypasses the delay and prevents a timeout. Note that you must have the details of the request object to gather the necessary information to make this request.

For example:

$ curl -H 'Content-Type: ''' -X PUT -d '{    "Status": "SUCCESS",
    "PhysicalResourceId": "test-CloudWatchtrigger-1URTEVUHSKSKDFF",
    "StackId": "arn:aws:cloudformation:us-east-1:111122223333:stack/awsexamplecloudformation/33ad60e0-5f25-11e9-a734-0aa6b80efab2
  ",
    "RequestId": "e2fc8f5c-0391-4a65-a645-7c695646739",
    "LogicalResourceId": "CloudWatchtrigger"
  }' 'https://cloudformation-custom-resource-response-useast1.s3.us-east-1.amazonaws.com/arn%3Aaws%3Acloudformation%3Aus-east-1%3A111122223333%3Astack/awsexamplecloudformation/33ad60e0-5f25-11e9-a734-0aa6b80efab2%7CMyCustomResource%7Ce2fc8f5c-0391-4a65-a645-7c695646739?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20170313T0212304Z&X-Amz-SignedHeaders=host&X-Amz-Expires=7200&X-Amz-Credential=QWERTYUIOLASDFGBHNZCV%2F20190415%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=dgvg36bh23mk44nj454bjb54689bg43r8v011uerehiubrjrug5689ghg94hb
  '

You can find the RequestID and the S3 presigned URL in CloudWatch Logs, provided that the request object was correctly logged. For more information, see How do I delete a Lambda-backed custom resource that's stuck in DELETE_FAILED status or DELETE_IN_PROGRESS status in CloudFormation?

Related information

What are some best practices for implementing Lambda-backed custom resources with CloudFormation?

AWS OFFICIAL
AWS OFFICIALUpdated 2 months ago