Build your custom resources to report, log, and handle failure
Exceptions can cause your function code to exit without sending a response. CloudFormation requires an HTTPS response to confirm whether the operation is a success or a failure. An unreported exception causes CloudFormation to wait until the operation times out before starting a stack rollback. If the exception reoccurs during rollback, then CloudFormation waits again for a timeout before it ends in a rollback failure. During this time, your stack is unusable.
To avoid timeout issues, include the following in the code that you create for your Lambda function:
Logic to handle exceptions
The ability to log the failure for troubleshooting scenarios
The ability to respond to CloudFormation with an HTTPS response confirming that an operation failed
Set reasonable timeout periods, and report when they're about to be exceeded
If an operation doesn't run within its defined timeout period, then the function raises an exception and no response is sent to CloudFormation.
To avoid this issue, consider the following:
Set the timeout value for your Lambda functions high enough to handle variations in processing time and network conditions.
Set a timer in your function to respond to CloudFormation with an error when a function is about to time out. A timer can help prevent delays for custom resources.
Build around Create, Update, and Delete events
Depending on the stack action, CloudFormation sends your function a Create, Update, or Delete event. Because each event is handled differently, make sure that there are no unintended behaviors when any of the three event types is received.
Understand how CloudFormation identifies and replaces resources
When an update initiates the replacement of a physical resource, CloudFormation compares the PhysicalResourceId that your Lambda function returns to the previous PhysicalResourceId. If the IDs differ, then CloudFormation assumes that the resource is replaced with a new physical resource.
However, to allow for potential rollbacks, the old resource isn't implicitly removed. When the stack update is successfully completed, a Delete event request is sent with the old physical ID as an identifier. If the stack update fails and a rollback occurs, then the new physical ID is sent in the Delete event.
Use PhysicalResourceId to uniquely identify resources so that when a Delete event is received, only the correct resources are deleted during a replacement.
Design your functions with idempotency
An idempotent function can be repeated numerous times with the same inputs, and the result is the same as doing it only once. Idempotency makes sure that retries, updates, and rollbacks don't create duplicate resources or introduce errors.
For example, CloudFormation invokes your function to create a resource, but doesn't receive a response that the resource is successfully created. CloudFormation might invoke the function again, and create a second resource. The first resource can then become orphaned.
Implement your handlers to correctly handle rollbacks
When a stack operation fails, CloudFormation attempts to roll back and revert all resources to their prior state. This results in different behaviors depending on whether the update caused a resource replacement.
To make sure that rollbacks are successfully completed, consider the following:
Avoid implicitly removing old resources until a Delete event is received.