- Newest
- Most votes
- Most comments
There are indeed several more efficient ways to troubleshoot and find out why your CloudFormation deployment is failing. Here are some approaches you can consider:
-
Use the "Detect root cause" feature: You've already used this, which is a good start. It helps identify the likely root cause of the failure by adding a "Likely root cause" label to the specific event status.
-
Examine CloudTrail events: You've also done this, which is another good step. Sometimes, the CloudTrail events can provide more detailed information about the failure.
-
Review the CloudFormation logs: Check the CloudFormation logs for any reference to AWS Config and Systems Manager-related issues. These services can sometimes cause conflicts with CloudFormation deployments.
-
Check for resource constraints: Ensure that you haven't reached the maximum number of allowed resources in your target account. For example, you might have hit the limit of allowed IAM roles.
-
Validate your template: Before deployment, use the AWS CloudFormation Linter (cfn-lint) to validate your template. This can help catch errors before you attempt to deploy.
-
Check permissions: Ensure that you have sufficient permissions in the target account to create all the resources specified in your template.
-
Verify global resource uniqueness: If your template is creating global resources like S3 buckets, make sure the names are unique across all regions.
-
Use the deployment timeline view: This new feature in CloudFormation provides a visualization of the sequence of actions during a stack operation. It can help you understand the exact order of resource provisioning and identify where things are going wrong.
-
Check for dependencies: Ensure that all necessary dependencies are properly defined in your template. You might need to add
DependsOn
attributes to some resources. -
Verify the script location: Double-check that the S3 bucket and key specified for your Glue job script actually exist and are accessible.
-
Consider regional differences: Since the failure is specific to the Malaysia region, investigate if there are any region-specific constraints or limitations that might be causing the issue.
-
Use AWS Config: This service can help you assess, audit, and evaluate the configurations of your AWS resources, which might reveal issues affecting your deployment.
If these methods don't reveal the cause of the failure, your idea of deploying with your own developer account to iterate faster could be a good approach. This would allow you to test and debug more quickly without affecting the main deployment pipeline. However, be cautious to ensure your test environment closely mirrors the production environment to avoid introducing new variables.
If the issue persists after trying these methods, contacting AWS Support might indeed be necessary to get to the root of the problem, especially given that this is an internal AWS project with a large and complex stack.
Sources
Determine the cause of a stack failure - AWS CloudFormation
Troubleshooting CloudFormation - AWS CloudFormation
Troubleshoot CloudFormation module errors | AWS re:Post
Peek inside your AWS CloudFormation Deployments with timeline view | AWS DevOps & Developer Productivity Blog
Relevant content
- Accepted Answerasked a year ago
- asked 3 years ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 5 months ago