How do I troubleshoot a CloudWatch alarm that doesn’t invoke?

4 minute read
0

I want to troubleshoot an Amazon CloudWatch alarm that doesn’t invoke.

Short description

CloudWatch alarms evaluate metrics based on the data points that are available at the time of the alarm evaluation. Standard alarms evaluate every minute and high resolution alarms evaluate every 10 seconds. If the data points that are collected don't pass the threshold in the specified windows, then the alarm remains in the OK state.

Note: Windows are the time intervals when CloudWatch analyzes data to determine if an alarm invokes or doesn't invoke. The Period and number of Evaluation Periods define the window as a time interval.

The following are possible causes of an alarm that doesn't invoke:

  • An Amazon CloudWatch alarm just started.
  • For event driven and periodic metrics, alarms might not be invoked if the data points weren't pushed to the metric within the evaluation period.
  • The metric is unavailable.
  • The metric parameters, such as namespace, metric name, or dimensions, are misconfigured.
  • The metric doesn't have enough data to determine the alarm state.

Resolution

Check the metric filter configurations

For metrics that are created by a metric filter, check the following:

  • If you use a metric filter based on CloudWatch Logs, then make sure that the expected logs are generated and the filter is correctly defined.
  • Check that the log events include the expected values in the metric filter pattern. To make sure that the pattern matches as expected, test the pattern against example log events.
  • To make sure that the correct math statistic value is configured, check the alarm configuration.

Use an "M out of N alarm" setting

Note: For the following resolution, M represents the consecutive data points that are higher than the required threshold to invoke the alarm. N represents the total data points that are within the evaluation period.

Each subsequent alarm evaluation might use different aggregated data points because of data points that continue to flow into the CloudWatch metric. When you review the event history later, a complete set of data points appears. CloudWatch alarms use available data points at the time of evaluation to evaluate metrics. However, new data points might be published after an alarm evaluation. Those new data points can affect metric data. To resolve this issue, configure an "M out of N alarm" so that your CloudWatch alarms evaluate more data points.

Example scenario:

An M out of N alarm for CPUUtilization is configured where M equals 2, N equals 3, and the period is 5 minutes. Because N equals 3, the evaluation period is 15 minutes. Because M equals 2, if CPUUtilization exceeds the threshold for two of the last three 5 minute periods, then the alarm invokes.

  • If at 10 minutes CPUUtilization is at 85%, then the alarm is less than the threshold of 90%.
  • If at 15 minutes CPUUtilization is at 92%, then the alarm is greater than the threshold of 90%.
  • If at 20 minutes CPUUtilization is at 94%, then the alarm is greater than the threshold. The alarm invokes because the requirement that the threshold is exceeded for two of the last three 5 minute periods is met.

To configure an M out of N alarm setting, complete the following steps:

  1. Open the CloudWatch console.
  2. In the navigation pane, select Alarms, then All alarms.
  3. Locate and select the alarm that you want to configure for M out of N.
  4. Select the action dropdown menu, and then choose Edit.
  5. Select Additional configuration. Make sure that the first value specified is lower than the second value. This configuration determines the number of consecutive data points higher than the threshold that are required to invoke the alarm.

Related information

Why did my CloudWatch alarm initiate when its metric doesn't have any breaching data points?

Aggregation

AWS OFFICIAL
AWS OFFICIALUpdated 25 days ago