How do I understand my alarm and evaluate my alarm transition state in CloudWatch?

4 minute read
1

I want to understand my alarm and evaluate my alarm transition state in Amazon CloudWatch.

Short description

When you create an alarm, CloudWatch alarms evaluate the specified conditions based on the metric data provided. Note the following attributes that you use when you create an alarm:

  • Metric selection - The metric that you want to monitor.
  • Threshold configuration - The specific value that invokes the alarm.
  • Evaluation period - The number of alarm periods. Use this parameter to avoid false alarms.
  • DatapointsToAlarm - The number of data points that must be breached to invoke the alarm. Use this to set the alarm sensitivity.
  • Period - The interval for the metric data aggregation (for example, every 60 seconds).
  • Statistic - The type of metric data aggregation (for example, MIN or AVG).
  • Treat Missing Data (TMD) - The method used to evaluate alarms when metric data is missing.

To understand how CloudWatch evaluates alarms and then treats the missing data, see Evaluating an alarm and Configuring how CloudWatch alarms treats missing data.

Resolution

To review how CloudWatch analyzes data points that are retrieved, view your Alarm History in CloudWatch:

  1. Open the CloudWatch console, and then select Alarms in the navigation pane.
  2. Choose your alarm, and then select the History tab. In the Date column of the History tab, the hyperlinked timestamp entry shows when the alarm went into the ALARM state.

To evaluate your alarm transition state in Alarm History, see the following JSON examples for alarm 1 and alarm 2. For both example alarm JSONs, review the following conditions:

  • Metric - HealthyHostCount
  • Threshold configuration - 1 (1 data point or less within three minutes)
  • Evaluation period - 3 data points
  • Period - 1 minute
  • Statistic - Minimum
  • Treat missing data (TMD) - For example alarm 1, the treat missing data option is missing. For example alarm 2, the treat missing data option is breaching.

Example alarm 1 JSON:

"newState": {
      "stateValue": "ALARM",
      "stateReason": "Threshold Crossed: 1 out of the last 3 datapoints [1.0 (28/03/24 17:11:00)] was less than or equal to the threshold (1.0) (minimum 1 datapoint for OK -> ALARM transition).",
      "stateReasonData": {
        "version": "1.0",
        "queryDate": "2024-03-28T17:13:09.156+0000",
        "startDate": "2024-03-28T17:09:00.000+0000",
        "statistic": "Minimum",
        "period": 60,
        "recentDatapoints": [
          2,
          2,
          1
        ],
        "threshold": 1,
        "evaluatedDatapoints": [
          {
            "timestamp": "2024-03-28T17:11:00.000+0000",
            "sampleCount": 2,
            "value": 1
          }

For the preceding JSON, three data points were retrieved with the values of 2, 2, and 1. The alarm transitioned to the ALARM state because 1 of the last 3 data points was less than or equal to the threshold of 1.

Note: The evaluatedDatapoints parameter shows details on the breaching data points. For the preceding JSON, 2 samples were received by CloudWatch. When those samples are aggregated by the Minimum statistic, a data point of 1 is returned. This value crosses the threshold of less than or equal to 1. As a result, the alarm transitions to the ALARM state.

Example alarm 2 JSON:

"newState": {
      "stateValue": "ALARM",
      "stateReason": "Threshold Crossed: 2 datapoints were received for 3 periods and 1 missing datapoint was treated as [Breaching].",
      "stateReasonData": {
        "version": "1.0",
        "queryDate": "2024-03-28T20:09:52.566+0000",
        "startDate": "2024-03-28T20:00:00.000+0000",
        "statistic": "Minimum",
        "period": 60,
        "recentDatapoints": [
          2,
          2
        ],
        "threshold": 1,
        "evaluatedDatapoints": [
          {
            "timestamp": "2024-03-28T20:07:00.000+0000"
          }

For the preceding JSON, the alarm configuration evaluates three data points. Two data points were retrieved with the values of 2 and 2. The third data point value is missing and the TMD option is considered in the alarm evaluation. TMD is set to breaching and the missing data point value is a breaching value. This causes the alarm to transition to the ALARM sate.

Related information

Using Amazon CloudWatch alarms

Common features of CloudWatch alarms

Why did my CloudWatch alarm trigger when its metric doesn't have any breaching data points?

AWS OFFICIAL
AWS OFFICIALUpdated 12 days ago