Alarm not getting triggered even if the metric crosses threshold

0

Hi,

I have configured a cloudwatch alarm based on an expression which involves 2 metrics.

Expression = m2 - m1, where,

m2: AWS/AutoScaling - GroupInServiceInstances

m1: ECS/ContainerInsights - TaskCount

Alarm Config : Expression <= 5 for 1 datapoints within 1 minute

Missing data case is considered as threshold breach.

The problem is that the alarm is not getting triggered as per the expectation. After the expression breaches the threshold(5), as per the alarm config mentioned above, it should trigger after a minute(1 datapoints within 1 minute) but the alarm gets triggered after a certain(inconsistent) amount of time which is causing the actions(autoscaling) associated to the alarm to be delayed. The delay ranges from 2 - 15 minutes.

Please refer to this link for a screenshot. The blue line in the first graph denotes the expression value and red line the threshold. As can be seen, the expression crosses the threshold at 7:23 but the alarm gets triggered at 7:40. The In Alarm state(red bar) in the second chart is triggered after 17 minutes then it should have.

Any help is really appreciated.

asked 10 months ago53 views
2 Answers
1

From the screenshot actually it looks like alarm is breached when your expression is greater than threshold, not lower.

answered 10 months ago
0

Hi Dhruv,

Given the Alarm configuration and the screenshot that you have provided, they are not quite aligning with each other.

What I can suggest is to look at the Alarm History, and check that particular StateUpdate happened at 7:40 as you mentioned to understand the reason of the triggering which could enlighten us why the alarm triggered. From the History of StateUpdate look for section starting with below for example:

...
"newState": {
      "stateValue": "ALARM",
      "stateReason": "Threshold Crossed: 1 out of the last 1 datapoints [42.7118644067809 (21/01/22 13:00:00)] was greater than or equal to the threshold (40.0) (minimum 1 datapoint for OK -> ALARM transition).",
...

This section will give you explanation on how the Alarm got triggered, and by what reason. The reason data also can provide you confirmation of the Alarm configuration of the threshold and the comparison operator.

Further down information within stateReasonData like recentDatapoints, threshold, and evaluatedDatapoints sections will provide further details into the StateUpdate.

Hope this helps to further troubleshoot your Alarm configuration and the state updates regarding ALARM state.

SUPPORT ENGINEER
answered 10 months ago
  • Hi Munkhbat_T,

    Thanks for the detailed response.

    I am adding more details below from the history section. This is for a different timeframe(24/01/2022) as compared to the 1 mentioned in the original question.

    Here is the screenshot for alarm and metric value: https://ibb.co/s6RjBTH. As can be seen here, the alarm triggered at 4:12 when it was supposed to trigger at 3:49.

    Here is a screenshot from the history section: https://ibb.co/6gyjkgR for the same duration(bottom rows).

    Here is a screenshot of the state change data for the alarm triggering at 4:12 : https://ibb.co/7vLgPQR .

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions