- Newest
- Most votes
- Most comments
There are several reasons why your CloudWatch alarms might not be triggering even when CPU utilization reaches 100%. Here are some suggestions to troubleshoot the issue:
-
Check the alarm configuration:
- Ensure that the metric name, dimensions, and statistic in your alarm configuration match exactly with the metric you're trying to monitor.
- Verify that the threshold, period, and evaluation periods are set correctly.
-
Verify data points:
- CloudWatch alarms evaluate metrics based on data points available during the evaluation period. Make sure that data points are being pushed to the metric within the specified evaluation period.
- If there are insufficient data points, the alarm may remain in the OK state or show INSUFFICIENT_DATA.
-
Check IAM permissions:
- Ensure that the IAM role attached to your EC2 instances has the necessary permissions to perform the PutMetricData action.
- Consider attaching the CloudWatchAgentServerPolicy to the role to grant all required permissions.
-
Examine CloudWatch agent logs:
- If you're using the CloudWatch agent, check its log files for any error messages related to connectivity, permissions, or configuration issues.
-
Test network connectivity:
- Verify that your EC2 instances can connect to the internet and reach the CloudWatch endpoints.
- If using a VPC, ensure that the security group associated with the VPC endpoint allows inbound traffic from your instances.
-
Consider using "M out of N" alarm settings:
- This allows the alarm to evaluate more data points by requiring the threshold to be exceeded for a specified number of consecutive data points within the evaluation period.
-
Review your Terraform configuration:
- Double-check that your Terraform code is correctly defining the alarm parameters, including the correct metric name, namespace, and dimensions.
If you've verified all these points and the issue persists, you may want to manually test the metric collection by running the put-metric-data command on the instances that run the CloudWatch agent. This can help isolate whether the problem is with metric collection or alarm configuration.
Sources
Troubleshoot a CloudWatch alarm that doesn't invoke | AWS re:Post
Troubleshoot a CloudWatch alarm that monitors metrics | AWS re:Post
CloudWatch alarms - AWS Prescriptive Guidance
Hi, we cannot see any details on the alarms in your message above, can you please share details?
Usually, the most common mistakes that prevent alarms from firing are when the alarm is created with a wrong reference to the metric. A reference to a metric is most often wrong in the following cases:
- when you specify a unit which doesn’t match the metric unit. Note that alarms don’t need a unit to work, so if there is only one variant of your metric, we recommend not specificying the unit.
- when you list only a subset of the dimensions on the metric. For example, if your metric has 2 dimensions but you only set key and value for one dimension.
Relevant content
- asked 3 years ago
- AWS OFFICIALUpdated 6 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated 7 months ago