How do I resolve the "did not stabilize" error that I get when I create or update the AWS::MSK::Cluster resource in CloudFormation?

4 minute read
0

I want to resolve the AWS CloudFormation error: "Resource of type 'AWS::MSK::Cluster' with identifier did not stabilize."

Short description

This error indicates that the provision resource can't reach the state that's specified in the template property within the timeout period. The timeout might be because of missing permissions, resource limit exceptions, an invalid resource property specification, or underlying MSK service interruptions.

Sometimes, the CloudFormation events don't provide the exact reason for you to troubleshoot the error. To find the exact reason for the error, use the Amazon Managed Streaming for Apache Kafka (Amazon MSK) console.

Note: To prevent a stack rollback, select Preserve successfully provisioned resources for Stack failure options in the CloudFormation console.

Resolution

The following are some of the common errors that you might receive and their solutions.

Error delivering broker logs to CloudWatch Logs

When you create a cluster that sends broker logs to Amazon CloudWatch Logs, you might get one of the following error responses. You can view these error responses in the AWS CloudTrail logs for CloudWatch Logs. If the cluster that failed still exists, then you can also view the error responses in the Amazon MSK console.

Error: "InvalidInput.LengthOfCloudWatchResourcePolicyLimitExceeded"

The preceding error occurs when the length of the CloudWatch resource policy that's associated with Amazon MSK broker logs exceeds the 5120 character quota that CloudWatch allows. CloudWatch logging might be activated on your cluster, and a log group might be associated with your cluster. 

Amazon MSK tries to add the log group's ARN to the list of resources that are allowed to stream logs. The service adds the log group's ARN to the AWSLogDeliveryWrite20150319 resource policy. For more information, see Enabling logging from AWS services. When the resource policy exceeds the character limit, CloudWatch Logs automatically turns on /aws/vendedlogs/* in the resource policy for that service.

To view the CloudWatch resource policy, run the AWS Command Line Interface (AWS CLI) command describe-resource-policies.

Note: If you receive errors when you run AWS CLI commands, then see Troubleshoot AWS CLI errors. Also, make sure that you're using the most recent AWS CLI version.

To resolve this error, add /aws/vendedlogs/ to the prefix of the log group. Then, recreate the resource. For more information, see Log group resource policy in Logging that requires additional permissions [V1].

Error: "InvalidInput.NumberOfCloudWatchResourcePoliciesLimitExceeded"

The preceding error occurs when you reached the maximum number of CloudWatch resource policies per AWS Region. The maximum number is 10, and you can't change this quota. To resolve this error, choose an existing CloudWatch Logs policy in your account, and attach the following JSON script to it:

{
"Sid": "AWSLogDeliveryWrite",
"Effect": "Allow",
"Principal": {
"Service": "delivery.logs.amazonaws.com"
},
"Action": ["logs:CreateLogStream", "logs:PutLogEvents"],
"Resource": ["*"]
}

If you still get an error, then attach the script to another one of your CloudWatch Logs policies. Then, recreate the cluster to set up a broker-log delivery to CloudWatch Logs.

Error: "InvalidInput.InsufficientPermissions"

The preceding error occurs when the user or role doesn't have the required permissions to create the MSK cluster. Look for an error message in the CloudTrail logs that says Access Denied for MSK API operations. Or, look for errors that are related to CloudWatch Logs APIs, such as CreateLogGroup.

To resolve this error, make sure that the user or role has the necessary permissions. For more information about how to configure the required permissions for the AWS managed policy, see AWS managed policy: AWSMSKFull Access.

Error because of a property specification that's not valid

You get the "did not stabilize error" when you assign non-valid or non-existent values to the AWS::MSK::Cluster resource properties during cluster creation. Make sure that all values that you assign to the cluster properties are in the allowed format.

Related information

Error delivering broker logs to Amazon CloudWatch Logs

Logs sent to CloudWatch Logs

AWS OFFICIAL
AWS OFFICIALUpdated a month ago